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A  Variational  Framework  for  Exemplar-Based  Image 
Inpainting 

Pablo  Arias  •  Gabriele  Facciolo  •  Vicent  Caselles  •  Guillermo  Sapiro 


Abstract  Non-local  methods  for  image  denoising  and 
inpainting  have  gained  considerable  attention  in  recent 
years.  This  is  in  part  due  to  their  superior  performance 
in  textured  images,  a  known  weakness  of  purely  lo¬ 
cal  methods.  Local  methods  on  the  other  hand  have 
demonstrated  to  be  very  appropriate  for  the  recovering 
of  geometric  structures  such  as  image  edges.  The  syn¬ 
thesis  of  both  types  of  methods  is  a  trend  in  current 
research.  Variational  analysis  in  particular  is  an  appro¬ 
priate  tool  for  a  unified  treatment  of  local  and  non-local 
methods.  In  this  work  we  propose  a  general  variational 
framework  non-local  image  inpainting,  from  which  im¬ 
portant  and  representative  previous  inpainting  schemes 
can  be  derived,  in  addition  to  leading  to  novel  ones.  We 
explicitly  study  some  of  these,  relating  them  to  previous 
work  and  showing  results  on  synthetic  and  real  images. 

Keywords  Inpainting  •  Variational  methods  •  Self¬ 
similarity  •  Non-local  methods  •  Exemplar-based 
methods 


1  Introduction 

Image  inpainting ,  also  known  as  image  completion  or 
disocclusion ,  is  an  active  research  area  in  the  image 
processing  field.  The  purpose  of  inpainting  is  to  ob¬ 
tain  a  visually  plausible  image  interpolation  in  a  region 
in  which  data  are  missing  due  to  damage  or  occlusion. 
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Usually,  to  solve  this  problem,  the  only  available  data 
is  the  image  outside  the  region  to  be  inpainted.  In  ad¬ 
dition  to  its  theoretical  interest,  image  inpainting  has 
applications  to  image  and  video  editing  and  restoration. 

Most  inpainting  methods  found  in  the  literature  can 
be  classified  into  two  groups:  geometry-  and  texture- 
oriented  methods.  We  now  briefly  review  the  develop¬ 
ments  in  both  types  of  approaches,  with  emphasis  in 
texture-oriented  methods.  This  review  will  helpful  for 
motivating  the  proposed  formulation. 

Geometry- oriented  methods.  Images  are  modeled  as  func¬ 
tions  with  some  degree  of  smoothness,  expressed  for  in¬ 
stance  in  terms  of  the  curvature  of  the  level  lines  or  the 
total  variation  of  the  image.  The  interpolation  is  per¬ 
formed  by  continuing  and  imposing  this  model  inside 
the  inpainting  domain,  usually  by  means  of  a  partial 
differential  equation  (PDE).  Such  PDE  can  be  derived 
from  variational  principles,  as  for  instance  in  [6,19,20, 
28,44,45],  or  inspired  by  physical  processes  [8,11,54]. 
These  methods  show  good  performance  in  propagating 
smooth  level  lines  or  gradients,  but  fail  in  the  presence 
of  texture.  They  are  often  referred  to  as  structure  or 
cartoon  inpainting. 

Geometry-oriented  methods  are  local  in  the  sense 
that  the  associated  PDEs  only  involve  interactions  be¬ 
tween  neighboring  pixels  on  the  image  grid.  An  impli¬ 
cation  of  this  is  that  among  all  the  data  available  in  the 
image,  these  methods  only  use  that  around  the  bound¬ 
ary  of  the  inpainting  domain. 

Texture- oriented  methods.  Texture-oriented  inpainting 
was  born  as  an  application  of  texture  synthesis,  e.g.,  [26, 
35].  Its  recent  development  was  triggered  in  part  by  the 
works  of  Efros  and  Leung  [26]  and  Wei  and  Levoy  [55] 
using  non-parametric  sampling  techniques  (parametric 
models  have  also  been  considered,  e.g.  [40]).  In  these 
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works  texture  is  modeled  as  a  two  dimensional  proba¬ 
bilistic  graphical  model ,  in  which  the  value  of  each  pixel 
is  conditioned  by  its  neighborhood.  These  approaches 
rely  directly  on  a  sample  of  the  desired  texture  to  per¬ 
form  the  synthesis. 

In  practice  these  methods  work  progressively  by  ex¬ 
panding  a  region  of  synthesized  texture.  The  value  of 
a  target  pixel  x  is  copied  from  the  center  of  a  (square) 
patch  in  the  sample  image,  chosen  among  those  that 
best  match  the  available  portion  of  the  patch  centered 
at  x.  Levina  and  Bickel  [41]  provided  a  probabilistic 
theoretical  justification  for  this  strategy. 

This  method  (as  explained  above  or  with  some  mod¬ 
ifications)  has  been  extensively  used  for  inpainting  [9, 
10,22,25,26,48].  As  opposed  to  geometry-oriented  in¬ 
painting,  these  so-called  exemplar-based  approaches,  are 
non-local :  To  determine  the  value  at  x,  the  whole  image 
may  be  scanned  in  the  search  for  a  matching  patch. 

Since  these  texture  approaches  are  greedy  proce¬ 
dures  (each  hole  pixel  is  visited  only  once),  the  results 
are  very  sensitive  to  the  order  in  which  pixels  are  pro¬ 
cessed  [22].  This  issue  was  addressed  in  [39,56]  where 
the  inpainting  problem  is  stated  as  a  probabilistic  infer¬ 
ence  problem  in  a  graphical  model.  In  both  cases,  the 
image  is  modeled  using  a  pair-wise  Markov  Random 
Field  (MRF).  In  [39]  an  energy  considering  the  over¬ 
lap  error  of  adjacent  patches  is  minimized  using  belief 
propagation.  The  method  in  [56]  can  be  seen  as  an  ap¬ 
proximated  Expectation-Maximization  (EM)  method. 
A  similar  scheme  was  further  explored  in  [38] ,  however 
with  a  different  variational  justification,  closely  related 
to  the  models  addressed  in  this  work. 

The  algorithms  of  [38,56]  require  an  initial  com¬ 
pletion  which,  as  it  turns  out,  has  a  great  impact  on 
the  final  result.  In  both  cases  the  authors  resort  to  a 
multiscale  approach :  The  image  is  filtered  and  subsam¬ 
pled,  possibly  several  times.  Starting  from  the  coarsest, 
the  inpainting  algorithm  is  then  applied  sequentially 
to  each  scale,  using  the  upsampled  result  of  the  pre¬ 
vious  scale  as  initialization.  This  so-called  multiscale 
approach  yields  very  good  results,  and  has  been  also 
applied  successfully  to  other  exemplar-based  methods 
[30]. 

Another  variational  justification  for  texture-based 
methods  was  presented  in  [24],  where  the  inpainting 
problem  is  reformulated  as  that  of  finding  a  correspon¬ 
dence  map  r  :  O  — >  Oc,  O  being  the  inpainting  domain 
and  Oc  its  complement  w.r.t.  the  image  domain.  De¬ 
noting  the  image  by  u,  the  inpainted  value  at  position 
x  G  O  is  then  given  by  u(x)  =  u{T{x))^  being  T(-) 
the  correspondence  map.  The  authors  proposed  a  con¬ 
tinuous  energy  functional  in  which  the  unknown  is  the 


correspondence  map  itself: 

E(r)=  [  [  (u(r(x-y))-u(r(x)-y))2dydx, 

J  O  J  f2p 

where  fip  is  the  patch  domain  (centered  at  (0,  0)).  Thus 
r  should  map  a  pixel  x  and  its  neighbors  in  such  a  way 
that  the  resulting  patch  is  close  to  the  one  centered  at 
r(x).  This  model  has  been  the  subject  of  further  anal¬ 
ysis  by  Aujol  et  al  [4],  where  extensions  are  proposed 
and  the  existence  of  a  solution  in  the  set  of  piecewise 
roto-translation  maps  r  is  proved.  The  authors  also 
discuss  the  use  of  different  regularizers  which  include 
a  regularization  of  the  curvature  of  the  level  lines  (or 
surfaces)  of  the  image.  These  approaches  are  theoreti¬ 
cal  and  no  numerical  optimization  scheme  is  available 
so  far. 

A  different  variational  model  was  presented  in  [49]. 
Images  are  modeled  as  ensembles  of  patches  on  a  given 
patch  manifold.  For  inpainting,  the  patch  manifold  can 
be  learned  from  the  set  of  patches  in  the  hole’s  com¬ 
plement.  The  method  is  iterative,  with  each  iteration 
having  two  steps.  First,  the  patches  in  the  hole  are  pro¬ 
jected  onto  the  manifold.  Since  this  is  done  for  each 
patch  independently,  the  projected  patches  are  not  nec¬ 
essarily  coherent  with  each  other,  i.e.  overlapping  patches 
may  have  different  values  in  the  overlap  region.  There¬ 
fore,  in  the  second  step,  an  image  is  computed  by  aver¬ 
aging  the  patches  in  the  ensemble. 

Another  front  of  activity  is  given  by  the  techniques 
based  on  the  sparseland  model  [1,16],  in  which  the  im¬ 
age  is  restricted  to  have  a  sparse  representation  over  an 
overcomplete  basis  or  dictionary  [1,27,43].  The  main 
difference  between  dictionary-based  and  exemplar-based 
methods  lies  in  where  the  missing  information  is  ob¬ 
tained  from.  Dictionary  based  methods  look  for  the 
missing  data  in  the  dictionary  (as  a  linear  combina¬ 
tion  of  a  few  atoms),  whereas  exemplar-based  methods 
assume  that  the  information  needed  lies  elsewhere  in 
the  image  itself  (or  in  a  database  of  images  [33] ) .  These 
methods  perform  well  in  problems  with  small  or  scat¬ 
tered  inpainting  domains,  but  fail  with  a  large  holes. 

Exemplar-based  methods  provide  impressive  results 
in  recovering  textures  and  repetitive  structures.  How¬ 
ever,  their  ability  to  recreate  the  geometry  without  any 
example  is  limited  and  not  well  understood.  Therefore, 
different  strategies  have  been  proposed  which  combine 
geometry  and  texture  inpainting  [9,17,25,37].  These 
methods  usually  decompose  the  image  in  some  sort  of 
structure  and  texture  components.  Structure  is  recon¬ 
structed  using  some  geometry-oriented  scheme,  and  this 
is  used  to  guide  the  texture  inpainting. 

Contributions.  Despite  these  combined  methods,  geom¬ 
etry  and  texture  inpainting  are  still  quite  separate  fields, 
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each  one  with  its  own  analysis  and  implementation  tools. 
Variational  models  as  the  one  introduced  in  this  pa¬ 
per  can  provide  common  tools  allowing  a  unified  treat¬ 
ment  of  both  trends.  We  therefore  propose  a  variational 
framework  for  non-local  image  inpainting  as  a  contri¬ 
bution  to  the  modeling  and  analysis  of  texture-oriented 
methods. 

Like  the  non-local  means  denoising  algorithm  [5, 
14]  we  encode  the  image  redundancy  and  self-similarity 
(measured  as  patch  similarity)  as  a  non-local  weight 
function  w  :  OxOc  — »  M.  This  function  serves  as  a  fuzzy 
correspondence,  and  differs  from  the  works  [4,24],  al¬ 
though  a  (eventually  multivalued)  correspondence  map 
can  be  approximated  as  a  limit  of  our  model.  The  pro¬ 
posed  formulation  is  rather  general  and  different  in¬ 
painting  schemes  can  be  derived  naturally  from  it,  via 
the  selection  of  the  appropriate  patch  similarity  crite¬ 
rion.  In  this  work  we  present  four  of  them,  patch  NL- 
means ,  - medians ,  -Poisson  and  -gradient  medians ,  cor¬ 
responding  to  similarity  criterions  based  on  L 2-  and 
Id-norms  between  patches  or  their  gradients.  We  pro¬ 
vide  an  comprehensive  empirical  comparison  on  real 
and  synthetic  problems,  showing  the  benefits  and  limi¬ 
tations  of  each  model. 

The  patch  NL-means  is  related  to  the  method  of  [38, 
56]  and  can  be  interpreted  in  terms  of  the  mean  shift 
[21]  and  the  manifold  models  of  [49].  The  other  schemes 
are,  to  the  best  of  our  knowledge,  novel.  Both  gradient- 
based  methods,  patch  NL-Poisson  and  -gradient  me¬ 
dians  combine  the  exempar-based  interpolation  with 
local  PDE-based  diffusion  schemes.  This  results  in  a 
smoother  continuation  of  the  information  across  the 
boundary  and  inside  the  inpainting  domain,  and  in  a 
better  propagation  of  structures. 

Finally,  we  also  present  an  initial  discussion  along 
the  line  of  providing  a  formal  modelling  of  the  multi¬ 
scale  scheme.  We  argue  that  this  approach  is  not  just 
some  heuristic  to  find  a  good  minimum  of  the  single 
scale  inpainting  functional,  but  constitutes  an  inpaint¬ 
ing  criterion  itself. 

Although  the  focus  of  this  work  lies  on  inpainting, 
the  framework  we  are  introducing  can  be  adapted  for 
its  application  in  other  contexts.  In  particular  it  is  re¬ 
lated  to  graph-based  non-local  regularizes.  Inspired  by 
graph  regularization  techniques  [57],  these  approaches 
model  the  image  as  a  graph  characterized  by  the  sim¬ 
ilarity  weights  [32,42].  Recently,  our  formalism  [3]  was 
extended  by  Peyre  et  al.  in  [51]  to  provide  a  variational 
justification  to  the  case  in  which  the  graph  is  adaptive. 
On  the  other  hand  the  patch  NL-means  scheme  pro¬ 
vides  a  model  for  an  iterative  version  of  the  non-local 
means  algorithm  with  adaptive  similarity  weights.  Sim¬ 
ilar  approaches  have  been  applied  to  denoising  [5,52], 


super-resolution  [53],  texture  denoising  [12],  and  demo- 
saicing  [15]  among  others.  We  will  discuss  the  relation 
with  these  models  in  the  text. 

The  rest  of  this  document  is  organized  as  follows.  In 
Section  2  we  introduce  the  proposed  variational  frame¬ 
work,  together  with  the  derivation  and  the  empirical 
comparison  of  the  different  inpainting  schemes.  The 
links  with  related  work  are  discussed  in  Section  3.  In 
Section  4  we  present  and  discuss  the  multiscale  ap¬ 
proach.  And  in  Section  5  we  present  experimental  re¬ 
sults  on  real  images  allowing  to  compare  our  results 
with  the  state  of  the  art.  Concluding  remarks  and  fu¬ 
ture  work  is  discussed  in  Section  6. 

We  note  that  a  preliminary  version  of  this  work  has 
been  presented  in  [3]. 

Notation.  Images  are  denoted  as  functions  u  :  Q  — >  M, 
where  Q  denotes  the  image  domain,  usually  a  rectangle 
in  M2.  Pixel  positions  are  denoted  by  x,  x ,  z,  z  or  y, 
the  latter  for  positions  inside  the  patch.  A  patch  of  u 
centered  at  x  is  denoted  by  pu{%)  =  pu(x,m)  :  — >  M, 
where  ftp  is  a  rectangle  centered  at  (0,  0).  The  patch  is 
defined  by  pu(x,y)  =  u(x  +  ?/),  with  y  E  f2p.  O  C  Q 
is  the  hole  or  inpainting  domain,  and  Oc  =  f2\0.  We 
still  denote  by  u  the  part  of  the  image  u  inside  the  hole, 
while  u  is  the  part  of  u  in  Oc :  u  =  u\o c-  Additional 
notation  will  be  introduced  in  the  text. 

2  Variational  framework 

Our  variational  framework  is  inspired  by  the  following 
non-local  functional 

Fw(u)  =  /  /  w(x,x)(u(x)  —  u(x))2  dxdx  (1) 

Jo  Joc 

where  w  :  O  x  Oc  — >  M+  is  a  weight  function  that 
measures  the  similarity  between  patches  centered  in  the 
inpainting  domain  and  in  its  complement. 

Let  us  assume  for  the  moment  that  the  weights  are 
known.  The  minimum  of  (1)  should  have  a  low  pixel 
error  (u(x)  —  u(x))2  whenever  the  similarity  w(x,x)  is 
high.  In  this  way  the  similarity  weights  drive  the  in¬ 
formation  transfer  from  known  to  unknown  pixels.  A 
similar  functional  was  proposed  in  [31]  as  a  non-local 
regularization  energy  in  the  context  of  image  denois¬ 
ing.  It  models  the  non-local  means  filter  [5,14]  when 
the  weights  are  Gaussian 

w(x,x)  oc  exp  (kl|| Pu(x)  ~Pu(x) ||2^  , 

where  ||  •  ||  is  a  weighted  L2-norm  in  the  space  of  patches 
and  h  is  a  parameter  that  determines  the  selectivity  of 
the  weigths  w. 
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In  [31]  the  weights  are  considered  known  and  remain 
fixed  through  all  the  iterations.  While  this  might  be  ap¬ 
propriate  in  applications  where  they  can  be  estimated 
from  the  noisy  image,  in  the  image  inpainting  scenario 
here  addressed,  the  weights  are  not  available  and  have 
to  be  inferred  together  with  the  image  (as  in  [50,53]). 
One  of  the  novelties  of  the  proposed  framework  is  the 
inclusion  of  adaptive  weights  in  a  variational  setting. 

For  this  reason,  we  will  consider  the  weight  function 
w  as  an  additional  unknown.  Instead  of  prescribing  ex¬ 
plicitly  the  Gaussian  functional  dependence  of  w  w.r.t. 
u ,  we  will  do  it  implicitly,  as  a  component  of  the  opti¬ 
mization  process.  In  doing  so,  we  obtain  a  simpler  func¬ 
tional,  avoiding  to  deal  with  the  complex,  non-linear  de¬ 
pendence  between  w  and  u.  In  our  formulation,  w(x,-) 
is  a  probability  density  function, 


and  can  be  seen  as  a  relaxation  of  the  one-to-one  corre¬ 
spondence  map  of  [4,24],  providing  a  fuzzy  correspon¬ 
dence  between  each  x  G  O  and  the  complement  of  the 
inpainting  domain. 

In  this  setting,  we  propose  an  energy  which  contains 
two  terms,  one  of  them  is  inspired  by  (1)  and  measures 
the  coherence  between  the  pixels  in  O  and  those  in  Oc, 
for  a  given  similarity  weight  function  w.  This  permits 
the  estimation  of  the  image  u  from  the  weights  w.  The 
second  term  allows  us  to  compute  the  weights  given  the 
image.  The  complete  proposed  functional  is 


w)  —  ^  Fw 


Hw{x)  dx, 


(2) 


Let  us  now  make  some  additional  comments  on  the 
functional.  We  observe  that  the  term  (u(x)  —  u(x))2  in 
(1),  that  penalizes  differences  between  pixels,  is  substi¬ 
tuted  in  (3)  by  the  patch  error  function  e(pu{x)—pu{x)). 
This  has  two  consequences.  First,  minimizing  (3)  with 
respect  to  the  image  will  force  patches  pu(x)  to  be  simi¬ 
lar  to  pu{x)  for  each  pair  x,  x  such  that  w(x,  x)  is  large. 
The  other  implication  has  to  be  understood  together 
with  the  inclusion  of  the  second  term,  which  integrates 
the  entropy  of  each  probability  w(x,-)  over  O.  For  a 
given  completion  u ,  and  for  each  x  E  O,  the  optimum 
weights  minimize  the  mean  patch  error  for  pu{x),  given 

by 


L 


w(x,x)e(pu(x)  -pa(x))dx, 


while  maximizing  the  entropy.  For  instance  taking  s  as 
the  squared  L2-norm  of  the  patch,  then  the  resulting 
weights  are  Gaussian.  This  can  be  related  to  the  princi¬ 
ple  of  maximum  entropy  [36] ,  widely  used  for  inference 
of  probability  distributions.  The  parameter  h  controls 
the  trade-off  between  both  terms  and  is  also  the  selec¬ 
tivity  parameter  of  the  Gaussian  weights.  The  trivial 
minima  of  E  with  w(x,x)  =  0  everywhere  is  discarded 
by  restricting  w(x,-)  to  be  a  probability. 


The  patch  error  function.  Patches  are  functions  defined 
on  and  the  error  function  e  :  — »  M+  is  defined 

as  the  weighted  sum  of  pixel- wise  errors  e  :  M  — >  M+ 

e{pu(x)  ~  Pu{x))  :=  /  ga(y)e(u(x  +  y)-  u(x  +  y))dy, 

J  Qp 

(4) 


where 

Fw(u)  =  w(x,x)s(pu(x)  -pu(x))dxdx,  (3) 

Jo  Joc 

e(-)  is  an  error  function  for  image  patches,  and 
Hw(x)  =  —  /  w(x,  x )  logre(x,  x)dx, 

Joc 

is  the  entropy  of  the  probability  w(x,-). 

We  take  O,  the  extended  inpainting  domain ,  as  the 
set  of  centers  of  patches  that  intersect  the  hole,  i.e. 
O  =  0+f2p  =  {x  G  Q  :  (r+f]p)nO  /  0}.  Thus,  patches 
Pu(x)  centered  in  x  G  Oc  are  entirely  outside  O  (Figure 
1).  Accordingly,  we  consider  that  the  weight  function 
w  is  defined  over  O  x  Oc  and  J~c  w{x,x)dx  =  1.  For  a 
simplified  presentation,  we  assume  that  O  +  fip  C  j?, 
i.e.  every  pixel  in  O  supports  a  patch  centered  on  it 
and  contained  in  Q.  Analogously,  we  also  shrink  Oc  to 
have  Oc  +  f2p  C  Q. 


where  the  intra-patch  weight  function  ga  is  a  Gaussian 
centered  at  the  origin  with  standard  deviation  a.  We 
will  consider  the  L1-  and  the  squared  L2-norms  as  par¬ 
ticular  cases  of  e(-),  with  e(-)  =  |  •  |  and  e(-)  =  |  •  |2 
respectively. 

We  will  also  consider  patch  error  functions  involv¬ 
ing  the  gradient  of  the  image.  In  an  abuse  of  notation 
we  will  denote  the  gradient’s  patch  and  pixel- wise  error 
functions  as  e  :  M2  p  — ►  M+  and  e  :  M2  — >  M+,  respec¬ 
tively.  It  will  be  clear  from  the  argument  which  case 
is  intended,  as  in  this  example  s(p\/u(x)  —  p\7u{%))  := 
Io9a(y)e(yu(x  +  y)  -  Vu(x  +  y))dy. 

Additionally,  we  can  impose  an  additive  penaliza¬ 
tion  of  the  distance  p(x  —  x)  as  in  [38]  by  modifying 
the  patch  distance  function  £d{Pu{%)  —  pu(x),x,x)  = 
z{Pu(%)  —  Pu{x))  +  v{x  —  x)  t°  penalize  the  use  of  dis¬ 
tant  patches.  We  will  not  consider  this  modification  in 
the  current  work. 

As  it  will  be  discussed  below,  the  patch  error  func¬ 
tion  determines  not  only  the  similarity  criterion  but 
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also  the  image  synthesis,  and  thus  is  a  key  element  in 
the  proposed  framework. 


2.1  Minimization  of  E 

We  have  formulated  the  inpainting  problem  as  the  con¬ 
strained  optimization 

(id,  re*)  =  argminZ?(iq  w)  (5) 

u,w 

subject  to  w(x,x)dx  =  1  \/x  G  O, 

Joc 

where  E  is  the  inpainting  energy  defined  in  (2).  For  en¬ 
ergies  involving  image  gradients  we  will  consider  Dirich- 
let  conditions  at  the  boundary  between  the  inpainting 
domain  and  the  image,  and  Neumann  conditions  at  the 
boundary  of  the  image,  i.e.  u(x)  =  u(x)  in  dO\dI2  and 
\7u(x)  •  n{pc)  =  0  for  x  G  dO  H  df2. 

To  minimize  the  energy  E ,  we  use  an  alternate  min¬ 
imization  algorithm.  At  each  iteration,  two  optimiza¬ 
tion  steps  are  solved:  The  constrained  minimization  of 
E  with  respect  to  w  while  keeping  u  fixed;  and  the 
minimization  of  E  with  respect  to  u  with  w  fixed.  This 
procedure  yields  the  following  iterative  scheme: 


Algorithm  1  Alternate  minimization  of  E(u,w). 
Require:  Initial  Condition  uq(x)  with  x  G  O. 

l:  repeat 

2:  Update  Weights :  Wk  =  argmin^  E(uk,  re), 

s.t.  fgc  w(x,  x)dx  =  1. 

3:  Update  Image :  Uk+i  =  argmin uE(u,Wk)- 

4:  until  Stop  Criterion :  H'M/c+i  —  Uk\\  <  tolerance. 


In  the  weights  update  step,  the  minimization  of  E 
w.r.t.  w  yields: 


wk(x,x) 


q(x)eXV  y~\£^Puk^  ~Pu(x)) 


(6) 


where  q(x)  =  Jqc  exp  (-£e(pUfc(x)  -  pa(x)))  is  a  nor- 
malization  factor  that  makes  w(x,-)  a  probability.  For 
the  functionals  defined  with  gradient  patches,  the  weight 
update  equation  is  analogous  to  (6)  replacing  the  patch 
error  function  with  e{p\/Uk{x)  —  p\/u(x)). 

The  parameter  h  determines  the  selectivity  of  the 
similarity.  If  h  is  large,  maximizing  the  entropy  be¬ 
comes  more  relevant,  yielding  weights  which  are  less 
selective.  In  the  limit,  when  h  — >  oo,  w(x,  •)  becomes 
a  uniform  distribution  over  Oc.  On  the  other  hand,  a 
small  h  yields  weights  which  concentrate  on  the  patches 
similar  to  pu(x).  When  h  0,  we  compute  the  weights 


as  lim^-,0  x)  =  \n\x)\  —  nix))i  where  n(x)  C  Oc 

is  the  set  of  nearest  neighbors  of  x,  defined  as 

n{x)  =  argmin 6 (pu(x)  -  Pu(x )). 

xEO 

That  is  w(x,-)  can  be  considered  as  an  approxima¬ 
tion  to  a  multivalued  correspondence.  For  simplicity, 
in  practice  we  assume  that  \n(x)\  =  1,  i.e.  the  nearest 
neighbor  is  unique. 

The  image  update  step  deserves  more  attention  and 
is  described  next. 


2.1.1  Image  update  step. 


In  this  section  we  present  the  derivation  of  the  im¬ 
age  update  step  corresponding  to  the  four  patch  er¬ 
ror  functions  mentioned  earlier.  First  we  will  present 
the  cases  when  image  patches  are  compared  using  the 
squared  L2-norm  and  the  id-norm.  We  refer  to  the  re¬ 
sulting  algorithms  as  patch-wise  non-local  means  (patch 
NL-means),  and  medians  (patch  NL-medians).  Then 
we  consider  functionals  involving  the  gradients  of  the 
patches  both  with  the  squared  L2-norm  and  the  id- 
norm.  These  methods  will  be  referred  as  patch-wise 
non-local  Poisson  (patch  NL-Poisson),  and  gradient  me¬ 
dians  (patch  NL-GM). 

Before  moving  to  the  derivation  of  the  these  schemes, 
let  us  remark  that  with  the  change  of  variables  2  = 
x  -\-  y,  z  =  x  +  y' ,  the  image  energy  term  can  be  ex¬ 
pressed  as  an  accumulation  of  pixel- wise  errors: 


Fw(u)  = 


o  Joc 


w(x,  x) 


/  ga(y)e(u(x  Eg)-  u(x  +  y))dydxdx 
J  fip 

/  /  m(z,  z)e(u(z)  —  u(z))dzdz  +  C,  (7) 

Jo  Joc 


where  C  is  a  constant  term.  We  have  introduced  the 
pixel-wise  influence  weights  m  :  O  x  Oc  — »  M+  defined 
as 


Xoc(2  -  y)ga(y)w(z  -y,z-  y)dy. 


(8) 


The  function  takes  the  value  1  on  Oc  and  0  on  O. 
An  analogous  expression  can  be  computed  for  gradient 
patch  error  functions.  This  rewriting  simplifies  the  fol¬ 
lowing  derivations  and  provides  some  insights  on  the 
implications  of  using  patch- wise  errors. 

For  each  pair  of  pixels  (z,z)  G  O  x  Oc,  m(z,z) 
weights  the  effective  contribution  of  the  pixel- wise  error 
between  u(z)  and  u(z)  in  the  total  value  of  the  energy. 
The  quantity  m(z,z)  is  computed  by  integrating  the 
similarity  w{z—y ,  z—y)  between  all  patches  that  overlap 
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Fig.  1  Patch-wise  non-local  inpainting.  The  value  at  z  G  O 
is  computed  using  contributions  from  all  the  patches  that  overlap 
it,  these  are  patches  centered  at  x  G  O  such  that  z  =  x  +  y 
with  y  G  Op.  The  influence  function  m(z,z)  accumulates  all  the 
contributions  w(z  —  y:  z  —  y)  from  patches  centered  at  z  —  y  to 
z-y. 


z  and  those  that  overlap  2  in  the  same  relative  position 
(shown  in  Figure  1).  It  tells  us  how  much  evidence  there 
is  supporting  a  correspondence  between  the  locations  2 
and  z.  Also  observe  that  for  each  z,  fQc  m{z ,  z)dz  =  1, 
then  m(z,-)  can  also  be  interpreted  as  a  probability 
density  function.  Note  that  the  energy  (7)  corresponds 
to  (1),  with  the  patch  similarity  weights  w  being  sub¬ 
stituted  with  the  pixel- wise  influence  weights  ra,  a  sort 
of  spacial  convolution  of  w  with  kernel  ga. 


Patch  non-local  means.  If  we  use  a  weighted  squared 
L2-norm  as  a  patch  error  function  s(pu(x)  —  Pu(xj)  := 
fn  ga(y)\u(x  +  y)  —  u(x  +  y)\2dy  in  (3)  then  the  image 
energy  term  (7)  is  quadratic  on  u,  and  its  minimum  for 
fixed  weights  w  can  be  computed  explicitly  as  a  non¬ 
local  average: 

u(z)  =  — [  m(z,  z)u(z)dz,  (9) 

c{z)  Jo c 

for  z  G  O,  where  the  normalization  constant  c(z)  := 
fQc  m(z,z)dz.  Although  c(z)  =  1  with  the  current  def¬ 
inition  of  w  and  ga,  we  will  keep  a  generic  notation  in 
the  following  derivations. 

The  formal  similarity  with  the  non-local  means  equa¬ 
tion  hides  some  important  differences,  which  are  a  di¬ 
rect  consequence  of  the  use  of  a  patch  error  function  in 
the  image  energy  term.  To  obtain  more  insight  about 
this  let  us  expand  m  to  obtain: 

u(z )  =  At  [  9a(y)  f  w(z  -  y, x)u(x  +  y)dxdy. 
c\z)  Jnv  Joc 

There  are  two  averaging  processes  involved  in  the  syn¬ 
thesis.  The  outer  integral  goes  through  all  patches  pu{z— 
y)  overlapping  the  target  pixel  z.  Each  patch  suggests 
a  value  for  2  resulting  from  the  inner  sum:  A  non-local 
average  of  the  pixel  at  position  y  in  all  patches  Pu(x ) 
in  Oc.  This  sum  is  weighted  by  the  similarity  between 
the  patch  pu(z  —  y)  and  each  Pu(x). 


Therefore,  we  can  distinguish  two  types  of  pixel 
interactions.  Interactions  due  to  the  patch  overlap  of 
nearby  pixels  in  the  image  lattice  and  non-local  inter¬ 
actions  driven  by  the  similarity  weights.  The  latter  can 
be  controlled  by  the  selectivity  parameter  ft,  but  the 
extent  of  the  overlap  interactions  is  given  by  the  patch 
size.  In  particular  when  h  — »  0  Eq.  (9)  yields 

u(z)=  9a(y)u{n{z  -  y)  +  y)dy 
J  Qp 

(recall  we  are  assuming  a  unique  nearest  neighbor).  For 
each  x  G  O,  the  nearest  neighbor  of  pu(x)  is  centered 
at  x  and  all  these  patches  are  averaged  according  to  ga. 
This  blending  may  cause  some  blur,  which  leads  us  to 
consider  the  id-norm  in  the  search  of  a  more  robust 
image  synthesis. 

In  Appendix  A  we  prove  the  existence  of  minima  for 
the  Patch  NL-Means  energy  in  the  continuous  setting. 

Patch  non-local  medians.  The  id-norm  patch  error  func¬ 
tion  in  the  image  energy  term  corresponds  to  taking 
e(x)  =  \x\  in  (7).  The  Euler  equation  for  u ,  given  the 
influence  function  m,  can  be  formally  written  as 

[5uFw(u)](z)  =  /  sign[u(z) -u(z)]m(z,z)dz3  0. 

Joc 

This  expression  is  multivalued,  since  sign(r)  =  r/\r\  if 
|r|  >  0  and  sign(r)  G  [—1,1]  if  r  =  0.  Its  solution  for 
each  u(z),z  G  O  is  obtained  as  a  weighted  median  of 
the  pixels  of  the  complement  Oc,  with  weights  m{z ,  •). 

Both  schemes  presented  so  far  perform  inpainting 
by  transferring  (by  averages  or  medians)  known  gray 
levels  into  the  inpainting  domain.  As  we  will  see  next, 
using  a  patch  error  function  based  on  the  gradient  of 
the  image  yields  methods  which  transfer  gradients  and 
compute  the  resulting  image  as  the  solution  of  a  PDE. 
This  results  in  better  continuation  properties  of  the  so¬ 
lution,  in  particular  at  the  boundary  of  the  inpainting 
domain. 

Patch  non-local  Poisson.  The  squared  id-norm  of  the 
gradient  in  the  image  energy  term  (3)  corresponds  to 
taking  e(-)  =  ||  •  ||2  in  (7),  where  ||  •  ||  is  the  Euclidean 
norm  in  M2.  The  energy  term  becomes 

Fw(u)  =  f  f  m(z,  z)\\\7u(z)  -  \7u(z)\\2dzdz.  (10) 

Jo  Joc 

Recall  that  we  consider  Dirichlet  conditions  at  the  bound¬ 
ary  between  the  inpainting  domain  and  the  known  data 
region,  and  Neumann  conditions  at  the  boundary  of  the 
image,  i.e.  u(x)  =  u(x)  in  dO\dQ  and  Vu(x)  -n(x)  =  0 
for  x  G  dO  fl  dfl.  The  Euler  equation  w.r.t.  u  is  given 
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by  a  non-local  Poisson  equation,  i.e.  a  Poisson  equation 
with  non-local  coefficients: 

V  •  [c{z)Vu{z)\  =  V  •  [c(z)v(z)},  (11) 

for  all  z  G  O,  where  u\o c  =  u,  c(z )  =  J0c  m(z ,  S)cL8  and 
the  field  t;  :  O  — »  M2  is  given  by 

v(z)  =  — —  [  m(z,  z)Vu(z)dz.  (12) 

c(^) 


Functionals  combining  intensity  and  gradients.  We  can 
use  any  of  the  gradient-based  energies  in  conjunction 
with  the  patch  NL-means  energy  by  considering  a  linear 
combination  of  the  corresponding  patch  error  functions. 
This  yields 

Fx(u)=  f  [  m(z,z)[(l  —  A)||Vrt(z)  —  \7u(z)\\p+ 

Jo  Joc 

+  \\u(z)  —  u(z)\2]dzdz  (14) 


The  solution  of  (11)  is  computed  with  a  conjugate  gra¬ 
dient  algorithm. 

Observe  that  the  solutions  of  (11)  are  minimizers  of 

[  c(z)\\Vu(z)  -  v(z)\\2dz. 

Jo 

Therefore,  u  is  computed  as  the  image  with  the  closest 
gradient  (in  the  L 2  sense)  to  the  guiding  vector  field 
v,  which  corresponds  to  a  non-local  weighted  average 
of  the  gradients  in  the  complement.  See  [47]  for  further 
uses  of  the  Poisson  equation  in  image  editing. 


Patch  non-local  gradient  medians.  To  complete  our  study 
of  the  image  energy  term  we  consider  the  L1-norm  of 
the  gradient,  that  is 

Fw(u)  =  /  /  m(z,  z)\\ Vu(z)  —  \7u(z)\\dzdz, 

Jo  Joc 

with  the  same  boundary  conditions  as  with  the  patch- 
wise  NL-Poisson.  To  minimize  it  we  perform  an  implicit 
gradient  descent: 

min  /  /  m(z,  z)\\ \7utJrl(z)  —  \7u(z)\\dzdz+ 

ut+1  Jo  Joc 

+  (13) 

where  at  each  step  ut+1  is  computed  using  the  fixed 
point  algorithm  described  in  the  Appendix  B,  based  on 
the  projection  method  presented  in  [18]. 

As  patch  NL-Poisson,  this  scheme  transfers  gradi¬ 
ents  and  interpolates  the  gray  levels  using  the  bound¬ 
ary  conditions.  With  the  use  of  the  L 1  error  function, 
we  expect  the  solution  of  patch  NL-GM  to  retain  more 
small  scale  detail  than  that  of  patch  NL-Poisson. 

Notice  that  for  both  gradient-based  methods  the 
patch  similarity  weights  w  are  computed  based  only 
on  the  gradients  (and  thus  also  the  pixel-wise  influ¬ 
ence  weights  m).  In  most  cases  however,  the  gradient 
is  not  a  good  feature  for  measuring  the  patch  similar¬ 
ity,  and  it  would  be  desirable  to  consider  also  the  gray 
level  data.  This  can  easily  be  achieved  within  our  vari¬ 
ational  framework  by  modifying  the  patch  error  func¬ 
tion  to  take  into  account  both  gradients  and  gray  level 
patches. 


where  the  parameter  A  G  (0, 1)  controls  the  mixture 
and  p  is  either  1  or  2.  The  resulting  algorithms  update 
the  weights  w  considering  both  the  intensity  and  the 
gradient,  therefore  improving  their  selectivity. 

Regarding  the  image  update,  notice  that  (14)  can 
be  rewritten  as 

F\(u)  oc  /  /  m(z,  z)\\ Vu(z)  —  Vu(z)\\pdzdz 

Jo  Joc 

+  Jo  c(*)K*)  _  /G) \2<iz’ 

where  f(z)  =  c(z)~1  fQc  m(z,  z)u{z)dz  is  the  solution 
of  the  patch  NL-means  image  update  step.  Thus  we  see 
that  the  combination  with  the  squared  L 2  patch  error 
function  translates  into  a  patch  NL-means  attachment 
term. 

For  the  case  of  patch  NL-Poisson  the  Euler  equation 
w.r.t.  u  becomes: 

V  •  [c(*)Vu(*)]  -  — ^yc(*)u(2)  = 

=  V  •  [c(z)v(z)\  -  (1*A)c(*)/(*),  (15) 

which  is  linear  and  can  be  solved  with  a  conjugate  gra¬ 
dient  scheme. 

The  patch  NL-GM  combined  functional  has  basi¬ 
cally  the  same  form  as  (13)  and  therefore  can  also  be 
solved  with  the  fixed  point  scheme  described  in  Ap¬ 
pendix  B. 


2.2  Comparison  of  proposed  schemes 

Let  us  advance  some  results  on  synthetic  problems  to 
highlight  the  main  characteristics  of  the  proposed  meth¬ 
ods.  First  we  consider  the  inpainting  of  a  regular  texture 
(shown  in  Figure  2)  with  two  different  mean  intensities, 
where  the  inpainting  domain  hides  all  patches  on  the 
boundary  between  the  dark  and  bright  textures.  With 
this  example  we  can  test  the  ability  of  each  method 
to  create  an  interface  between  both  regions.  Situations 
like  these  are  common  in  real  inpainting  problems,  for 
instance  due  to  inhomogeneous  lighting  conditions.  We 
have  also  added  Gaussian  noise  with  standard  deviation 
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Fig.  2  Inpainting  of  a  synthetic  texture.  The  initial  condition  is  shown  in  the  first  column.  The  other  four  columns  show  a  zoom 
(region  in  the  red  rectangle)  of  the  results  of  patch  NL-means,  -medians,  -Poisson  and  -GM.  Top  row,  h  =  0,  bottom  row  h  =  200, 
h  =  14,  h  =  400  and  h  =  20,  respectively.  The  intra-patch  weight  kernel  ga  is  shown  in  the  bottom  right  corner  of  the  initial  condition, 
it  has  a  standard  deviation  a  =  5  and  the  patch  size  is  s  =  15. 


a  =  10  to  show  the  influence  of  the  selectivity  param¬ 
eter  h.  Each  column  of  Figure  2  shows  the  results  of 
the  four  methods  described  in  the  previous  section.  We 
have  tested  each  method  with  h  =  0  (top  row),  and 
h  >  0,  chosen  approximately  to  match  the  expected 
deviation  of  each  patch  error  due  to  the  noise. 

The  first  notorious  difference  is  on  how  the  meth¬ 
ods  handled  the  transition  between  the  dark  and  bright 
textures.  Patch  NL-means  produces  a  smooth  transi¬ 
tion  whereas  a  sharp  step  is  obtained  with  the  patch 
NL-medians.  On  the  other  hand,  both  gradient  based 
methods  yield  a  much  smoother  shading  of  the  texture. 
This  is  due  to  the  fact  that  the  image  update  step  is 
computed  as  the  solution  of  a  PDE  which  diffuses  the 
intensity  values  present  at  the  boundary  of  the  inpaint¬ 
ing  domain.  These  PDEs  are  driven  by  a  vector  field  es¬ 
timated  non- locally  and  therefore  combine  non-local  ex¬ 
emplar  based  inpainting  with  local  interpolation  PDEs. 
For  the  case  of  patch  NL-Poisson  this  interpolation  is 
linear,  since  this  is  a  solution  of  the  homogeneous  Pois¬ 
son  equation  (i.e.  Laplace  equation). 
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Fig.  3  Profiles  of  the  results  in  Figure  2.  The  profiles  are 
taken  from  an  horizontal  line  going  between  the  circles  in  Figure 
2.  Top:  results  with  h  —  0  and  bottom:  results  with  h  >  0. 


As  expected,  the  results  using  a  higher  h  show  some 
denoising,  since  for  larger  h  the  patch  similarity  weights 
are  less  selective.  This  effect  can  be  better  appreciated 
in  the  profiles  shown  in  Figure  3,  which  depict  the  image 
values  for  a  horizontal  line  between  the  circles.  In  the 
usual  context  of  inpainting,  in  which  the  available  data 
is  not  perturbed  by  noise,  this  denoising  translates  into 
an  undesirable  loss  of  texture  quality.  For  that  reason 
we  use  h  =  0. 

Recall  that  in  the  limit  when  h  =  0,  the  weights 
w{x^x)  converge  to  a  Dirac’s  delta  function  at  the  set 
n(x)  of  nearest  neighbors  of  pu(x).  Even  if  we  assume 
that  the  nearest  neighbor  is  unique,  the  value  of  a  pixel 
is  still  computed  from  a  population  obtained  from  those 
nearest  neighbors  of  all  patches  that  overlap  the  pixel. 
In  the  case  of  periodic  patterns,  once  the  minimization 
has  reached  a  stable  state,  all  values  in  the  population 
will  be  basically  the  same:  patches  surrounding  a  pixel 
agree  on  its  value,  and  in  this  case  all  schemes  behave 
similarly.  Differences  arise  when  neighboring  patches 
cannot  agree  on  their  suggested  values.  Such  is  the  case 
of  the  step  in  Fig.  2.  For  non-periodic  patterns  and  ran¬ 
dom  textures  this  disagreements  will  be  common,  which 
may  affect  the  perceptual  similarity  of  the  synthesized 
texture  with  the  original. 

Figure  4  studies  this  effect.  The  image  consists  of 
Gaussian  noise.  The  top  row  shows  the  resulting  com¬ 
pletions,  and  the  bottom  row  shows  an  estimate  of  the 
local  variance  (computed  by  smoothing  the  image  of  the 
squared  differences  w.r.t.  the  mean).  The  red  curves  re¬ 
quire  a  brief  explanation.  Following  [7],  we  use  the  term 
Nearest  Neighbor  Field  (NNF)  to  refer  to  the  vector 
field  n(x)  —  x,  defined  over  O,  where  n(x)  G  Oc  is  the 
position  of  the  (assumed)  unique  nearest  neighbor  of 
pu{x).  In  Figure  4  we  show  in  red  the  boundaries  of  the 
regions  with  constant  NNF.  In  those  regions  patches 
are  translated  rigidly  from  somewhere  in  the  comple¬ 
ment.  We  can  observe  that,  for  all  inpainting  schemes, 
the  resulting  completion  consists  basically  of  a  patch- 
work  of  large  regions  of  rigid  translation.  The  interior  of 
these  regions,  away  from  their  boundaries  (red  curves), 
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Fig.  4  Inpainting  of  random  texture.  From  left  to  right:  patch-  NL-means,  NL-medians,  NL-Poisson  and  NL-GM.  Top  row: 
inpainting  results.  Bottom  row:  local  variance  estimate  of  the  noise,  superimposed  with  the  boundaries  between  regions  of  constant 
NNF  (red  curves).  Notice  the  attenuation  of  the  variance  over  the  red  curves,  specially  for  L2-based  methods. 


Fig.  5  Inpainting  of  structured  texture.  From  left  to  right:  Initial  condition,  result  of  path  NL-means,  -medians,  -Poisson  and 
-GM.  Results  with  s  =  15,  a  =  5  and  h  =  0.  The  treatment  of  color  images  is  described  in  Section  5. 


reproduces  an  exact  copy  of  the  source,  and  thus  the 
variance  is  preserved.  The  variance  decreases  along  the 
red  curves,  where  pixel  values  are  synthesized  using  in¬ 
coherent  contributions.  Both  for  the  intensity  models  or 
the  gradient  models,  the  L2-norm  causes  a  higher  decay 
of  the  variance  than  with  the  Zd-norm.  This  effect  can 
also  be  observed  in  figures  2  and  3,  where  it  can  be  seen 
that  patch  NL-medians  and  -GM  preserve  more  of  the 
Gaussian  noise  than  the  methods  based  on  L2-norms. 
We  will  refer  to  the  regions  of  constant  NNF  as  copy 
fronts.  Recall  however  that  copy  occurs  only  away  from 
their  boundaries. 

In  Figure  5  we  show  results  on  a  non-periodic,  struc¬ 
tured  texture.  We  can  see  that  patch  NL-means  and  - 
Poisson  show  some  smoothing,  whereas  both  Zd-based 
schemes  obtained  sharper  results.  Both  intensity  based 
methods  show  discontinuities  at  the  boundary  of  the 
inpainting  domain.  Notice  also  that  for  the  patch  NL- 
medians  algorithm  it  is  easy  to  identify  the  regions  in 
the  complement  that  have  been  replicated.  Where  two 
copy  fronts  meet  a  seam  is  produced.  With  the  patch 
NL-means  the  spatial  averaging  of  overlapping  patches 
creates  a  smooth  blending  of  the  copy  fronts. 

2.2.1  Combined  schemes 

Gradient  based  methods  produce  smooth  interpolations 
and  enforce  the  continuity  of  the  image  at  the  boundary 
of  the  inpainting  domain,  which  are  generally  desirable 
features.  For  this  to  be  done  variationally  the  similar¬ 


ity  weights  w  have  to  be  computed  using  patches  of 
the  gradient,  which  in  most  cases  does  not  provide  a 
reliable  measure  of  patch  similarity.  In  practice  better 
results  are  obtained  by  combining  these  methods  with 
patch  NL-means,  allowing  to  take  the  image  values  into 
account  for  the  computation  of  the  patch  error,  and  to 
synthesize  the  image  with  a  diffusion  PDE.  This  adds 
a  parameter  A,  which  controls  the  mixture. 

In  Figure  6  we  show  some  results  corresponding  to 
the  combination  of  both  gradient  schemes  with  patch 
NL-means  while  varying  the  mixture  coefficient  A.  The 
image  shows  a  periodic  pattern  with  an  illuminance  gra¬ 
dient.  Most  of  the  dark  exemplars  are  incomplete,  and 
thus  only  bright  exemplars  from  the  bottom  of  the  im¬ 
age  are  available.  The  rightmost  detail  shows  the  result 
of  patch  NL-means:  the  image  has  been  completed  us¬ 
ing  bright  patches  and  presents  a  discontinuity  on  the 
upper  side  of  the  hole.  On  the  other  hand,  a  completion 
using  gradients  only  (see  result  of  the  patch  NL-GM) 
manages  to  interpolate  both  the  texture  and  the  shad¬ 
ing.  The  small  images  on  the  right  show  results  of  both 
gradient  methods  with  different  values  for  A.  The  value 
of  the  mixing  parameter  A  should  be  carefully  selected 
since  it  mixes  two  different  magnitudes  (norms  of  gra¬ 
dients  and  gray  levels).  With  A  ^  0.1  for  patch  NL- 
Poisson  +  NL-means  and  A  ~  0.01  for  patch  NL-GM 
+  NL-means,  some  of  the  good  continuation  properties 
are  preserved  and  enough  color  information  is  added  to 
the  patch  metric. 
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Fig.  6  Linear  combination  of  gradient  based  methods  with  NL-means.  First  image  starting  from  the  left:  Initialization.  The 
gray  rectangle  is  the  inpainting  domain.  Only  patches  centered  outside  the  red  area  are  available.  Second  image:  Result  obtained  with 
patch  NL-GM,  using  A  =  0.  A  similar  result  is  obtained  with  the  patch  NL-Poisson  with  A  =  0  (not  shown).  Details:  Top,  from  left 
to  right:  results  with  patch  NL-Poisson  with  A  =  0.01,  0.05,  0.1, 1  (the  latter  corresponds  to  patch  NL-means).  Bottom,  patch  NL-GM 
with  A  =  0.001,0.005,0.01.  All  details  have  been  linearly  stretched  for  display.  Differences  are  most  noticeable  in  the  top  and  in  the 
left  of  the  images. 


2.2.2  Geometric  interpolation 


To  evaluate  the  ability  of  each  method  to  continue  the 
geometry  of  the  structures  at  the  boundary  of  the  in¬ 
painting  domain,  we  consider  a  very  simple  image  with 
a  gap  (shown  in  Table  1).  The  inpainting  region  is  ini¬ 
tialized  with  the  background  color.  For  this  evaluation 
we  fix  the  size  of  the  patch  and  increase  the  width  of 
the  gap.  For  narrow  gaps,  the  method  will  be  able  to 
join  both  ends  of  the  green  vertical  line.  When  increas¬ 
ing  the  gap,  at  a  certain  width  the  method  is  no  longer 
capable  of  recovering  the  vertical  line.  In  these  cases 
the  gray  initialization  prevails,  showing  a  completion 
in  which  the  vertical  line  is  interrupted.  The  first  col¬ 
umn  of  Table  1  shows  the  maximum  height  of  the  gap 
for  each  method.  Observe  that  the  combined  schemes 
improve  the  prolongation  of  geometric  structures,  and 
that  the  optimal  mixing  parameter  A  for  this  purpose  is 
near  to  the  values  that  we  have  proposed  in  the  previous 
section.  Basically  these  schemes  have  two  propagation 
mechanisms:  A  local  one,  by  diffusion  of  the  intensity 
values  by  the  PDE,  and  a  non-local  one  by  transference 
of  gradients  from  Oc.  When  A  >  0  these  mechanisms 
reinforce  each  other:  the  diffused  values  allow  a  better 
estimation  of  the  weights,  and  therefore  the  transfer¬ 
ence  of  more  appropriate  gradients,  which  will  help  to 
diffuse  the  intensity  values  further.  On  the  other  hand, 
intensity  based  methods  depend  only  on  the  iteration 
of  weights  computation  and  image  update  to  propagate 
information. 

The  second  column  of  Table  1  shows  the  results  ob¬ 
tained  by  incorporating  the  confidence  mask  later  de¬ 
scribed  in  Section  2.3.  An  alternative  way  to  prolong 
the  geometric  structures  is  to  increase  the  patch  size  as 
will  be  discussed  in  Section  4. 


Method 

GAP 

GAP 

tc  =  5 

P  +  M  A  =  0 

13 

40 

P  +  M  A  =  0.1 

16 

46 

P  +  M  A  =  0.5 

11 

34 

P  +  M  A  =  1 

9 

29 

Md 

7 

42 

GM  +  M  A  =  0 

15 

>  56 

GM  +  M  A  =  0.01 

21 

>  56 

GM  +  M  A  =  0.001 

40 

>  56 

Table  1  Geometric  interpolation.  The  inpainting  domain  is 
shown  in  white  and  the  patch  in  the  lower  right  corner  (9x9 
pixels).  The  table  reports  the  maximum  gap  width  for  which  the 
algorithm  is  capable  of  recovering  the  vertical  line.  P,  M,  Md  and 
GM  stand  for  patch  NL-Poisson,  -Means,  -Medians  and  -Gradient 
Median  respectively.  The  rightmost  column  shows  the  maximal 
gaps  obtained  with  the  use  of  a  confidence  masks  with  decay 
tc  =  5  (see  Section  2.3). 


2.3  Extensions 

Color  images.  An  energy  for  color  images  can  be  ob¬ 
tained  by  defining  a  patch  error  function  for  color  patches 
as  the  sum  of  the  error  functions  of  the  three  scalar 
components: 

3 

e(Pu(x)  -Pu(x))  =  ^e(pUi(x)  -Pui(x)), 

where  u  :  Q  — »  M3  is  the  color  image,  and  Ui,  with 
i  =  1,2,3,  its  components  (analogously  for  gradient- 
based  errors).  Given  the  weights,  each  channel  is  up¬ 
dated  using  the  corresponding  scheme  for  scalar  images. 
All  channels  are  updated  using  the  same  weights. 

Confidence  mask.  For  large  inpainting  domains,  it  is 
useful  to  introduce  a  mask  n  :  Q  — »  (0, 1]  which  assigns 
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a  confidence  value  to  each  pixel,  depending  on  the  cer¬ 
tainty  of  its  information  (see  also  [22,39]).  This  will  help 
in  guiding  the  flow  of  information  from  the  boundary 
towards  the  interior  of  the  hole,  eliminating  some  local 
minima  and  reducing  the  effect  of  the  initial  condition. 
The  resulting  image  energy  term  takes  the  form 


,(u)  =  L  f 

Jo  Joc 


K,(x)w(x,x)£(pu(x)  —  Pu(x))dxdx, 


where  ft  modulates  the  penalization  of  the  incoherences 
between  w  and  the  error  6  between  patches. 

The  effect  of  ft  on  the  image  update  step  can  be  seen 
on  the  pixel- wise  influence  weights  m 


rn 


Xqc  (z  -  y)9a{y)K{z  -  y)w(z  —  y,z  —  y)dy. 


Thus,  the  contribution  of  the  patch  pu(z  —  y )  to  the 
evidence  function  is  now  weighted  by  its  confidence. 
Patches  with  higher  confidence  will  have  a  stronger  in¬ 
fluence.  Notice  that  with  the  inclussion  of  the  confi¬ 
dence  mask,  the  normalization  coefficient  c(z)  becomes: 


c(z)  =  /  m(z,z)dz=  /  ga(y)n(z-y)dy. 

J  Oc  J  fip 

This  does  not  affect  intensity-based  methods,  but  has 
implications  on  the  image  update  step  for  gradient- 
based  ones.  For  example,  for  patch  NL-Poisson  the  er¬ 
rors  with  respect  to  the  non-local  guiding  vector  field 
||  Vu(z)  —  v(z)  ||  are  penalized  according  to  c(z). 

On  the  similarity  weights,  the  confidence  mask  has 
the  effect  of  modifying  the  selectivity  parameter  h  by 
a  locally  varying  h/ k(x).  If  the  confidence  is  high,  the 
effective  selectivity  h/ n{x)  will  be  lower,  thus  increasing 
the  selectivity  of  the  similarity  measure.  When  h  — >  0 
the  weights  tend  to  a  Dirac’s  delta  independently  of 
ft.  The  same  reasoning  applies  to  the  gradient  based 
energies. 

For  the  experiments  shown  in  this  paper,  the  confi¬ 
dence  mask  was  set  to 


(1  -  ft0)  exp  if  x  e  O, 

1  if  x  E  Oc, 


which  shows  an  exponential  decay  w.r.t.  the  distance 
to  the  boundary  inside  the  hole  d(-,dO).  Here  tc  >  0 
is  the  decay  time  and  fto  >  0  determines  the  asymp¬ 
totic  value  reached  far  away  from  the  boundary.  Setting 
tc  =  0  amounts  to  using  a  constant  confidence  mask. 
Table  1  shows  the  effect  of  using  a  confidence  mask  with 
tc  =  5  and  fto  =  0.1,  allowing  the  restoration  of  the  ver¬ 
tical  line  for  much  wider  gaps,  and  thus  alleviating  the 
dependence  with  the  gray  initial  condition. 

Other  shapes  of  the  confidence  mask  could  be  used 
for  controlling  different  aspects  of  the  dynamics  of  the 


completion  algorithm.  For  instance  controlling  the  de¬ 
cay  of  the  mask  from  certain  points  of  the  boundary 
allow  to  privilege  the  continuation  of  structures  from 
them. 


Interpolation  of  sparsely  sampled  images.  In  this  work 
we  have  implicitly  assumed  the  existence  of  a  set  of 
complete  patches  Oc,  constituting  the  set  of  exemplars. 
This  assumption  can  be  removed  by  generalizing  the 
patch  error  function  £(•)  so  that  it  depends  on  the  posi¬ 
tions  of  the  patches.  This  implies  a  potentially  different 
error  function  for  each  pair  of  patches.  This  has  been 
explored  in  [29] ,  applied  to  the  case  of  interpolation  of 
sparsely  sampled  images.  In  that  case  the  patch  error 
function  was  designed  to  consider  only  those  locations 
known  in  at  least  one  of  the  two  patches. 


3  Discussion  and  connections 

In  this  section  we  present  various  issues  related  to  the 
variational  framework  and  its  connections  with  other 
models. 


3.1  Probabilistic-geometric  model  interpretation 

The  proposed  energy  has  an  interpretation  in  terms  of 
a  probabilistic  model  in  the  space  of  patches,  which 
becomes  apparent  when  rewritten  using  the  general¬ 
ized  Kullback-Leibler  divergence  [23].  Given  two  posi¬ 
tive  and  integrable  functions  p,  q  defined  over  a  certain 
measure  space  T,  the  generalized  Kullback-Leibler  di¬ 
vergence  is  given  by: 


KL  0,  q)=  p(s)  log 
J  X 


P(s) 

<?(s) 


d  s- 


/  p(s)ds  +  /  q(s)ds, 
Jx  Jx 


assuming  that  the  integrals  exist.  With  this  notation 
(and  taking  into  account  that  w(x,-)  is  a  probability) 
the  functional  E  can  be  written  as 

E(u:w)  =  KL  (w(x,  •),  r(x,  •))  dx— 

Jo 

—  r(x,  x)dxdx, 

Jo  Joc 

where  r  is  the  unnormalized  Gaussian  weight  function 


r(x,x)  =  exp  (  -~s(pu(x)  - Pu{x)) 
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The  first  term  integrates  the  divergence  between  the 
functions  w(x,-)  and  r(x,  •),  for  each  x  G  O.  The  second 
term  can  be  interpreted  by  noticing  that 

q{pc)  =  r(x,x)dx  (16) 

Joc 

is  a  density  estimate  (in  the  patch  space)  of  the  set  of 
patches  in  Oc:  The  higher  the  amount  of  patches  in  Oc 
close  to  pu(x)  (according  to  the  scale  parameter  h),  the 
higher  the  value  of  q. 

The  minimizers  (u*,w*)  are  obtained  when  for  all 
(x,x)  G  O  x  Oc,  w*(x,x)  =  r*(x,  x)/q*(x),  (Gaussian 
weights  normalized  by  (16)),  and  the  patches  of  the  in- 
painted  image  are  in  regions  of  high  density  in  the  patch 
space.  This  provides  a  geometric  intuitive  interpreta¬ 
tion  of  our  variational  formulation.  The  image  is  con¬ 
sidered  as  an  ensemble  of  overlapping  patches.  Known 
patches  in  Oc  are  fixed,  forming  a  patch  density  model 
used  to  estimate  the  patches  in  O.  The  richness  of  the 
framework  is  given  in  part  by  the  fact  that  different 
norms  in  the  patch  space  induce  inpainting  schemes  of 
different  nature. 

3.2  Connections  with  statistical  mechanics 

The  proposed  energy  can  be  given  another  interpreta¬ 
tion  in  terms  of  statistical  mechanics,  as  the  Helmholtz 
free  energy  of  a  system  of  particles.  In  this  context 
we  consider  that  for  each  “particle”  x  G  O  there  is  a 
set  of  possible  configurations  indexed  by  the  parameter 
x  G  Oc  with  a  probability  density  w(x,x).  If  we  con¬ 
sider  that  each  configuration  has  an  energy  U(x,x)  = 
z(Pu{x)  then  the  probability  that  the  particle 

at  x  is  in  the  state  x  is  proportional  to 

e~f>U(x,*)/q(x) 

with  the  normalization  factor  q  given  by  the  partition 
function 

q(x)  =  Y^e~f>Ulx,it)- 

X 

Then  the  Helmholtz  free  energy  for  a  particle  x  is 
H(x)  :=  U{x)  +  p-'Six)  := 

:=  U(x,  x)w(x,  x)  +  /3_1  w(x,  x)  log  w(x,  x). 

X  X 

Then  the  total  Helmoltz  free  energy  for  the  system  of 
particles  is 

H  :=  ^H(x)  =  EE  U (x,  x)w(x,  x)+ 

X  Xx 

^EE  w(x,  x)  log  w(x,  x), 

X 


where  the  sums  are  extended  to  x  G  O  and  x  G  Oc.  We 
have  used  the  notation  (3  as 

3.3  Revisiting  related  work 

In  this  section  we  will  briefly  review  the  connections  of 
our  work  with  other  inpainting  algorithms  and  also  with 
existing  variational  models  of  non-local  regularization 
which  have  been  proposed  in  contexts  such  as  image 
denoising. 

The  method  in  [56]  is  closely  related  to  the  patch 
NL- means  scheme  of  Eq.  (9).  The  key  difference  lies  in 
the  underlying  theoretical  model.  The  problem  is  ad¬ 
dressed  as  a  MRF,  where  pixels  outside  the  hole  are 
observable  variables,  missing  pixels  in  the  hole  are  the 
parameters,  and  the  hidden  variables  are  given  by  the 
correspondence  r  :  O  — »  Oc ,  which  assigns  a  patch  out¬ 
side  the  hole  to  each  x  in  O.  The  method  can  be  seen 
as  an  approximate  EM  algorithm  for  maximizing  the 
log-likelihood  w.r.t.  the  pixels  in  O,  and  some  approx¬ 
imations  have  to  be  taken  to  make  the  optimization 
tractable.  Based  on  heuristics,  the  authors  also  pro¬ 
pose  to  use  more  robust  estimators  than  the  mean  for 
the  synthesis  of  pixels.  Within  the  framework  here  pro¬ 
posed,  robust  estimators  (as  the  median)  naturally  re¬ 
sult  from  particular  choices  of  the  patch  error  functions 
£(•)• 

In  [38]  an  energy  is  presented  which  can  be  seen  as 
a  limit  case  of  the  patch  NL-means  energy  when  h  — >  0. 
The  authors  propose  modifications  of  the  energy  which 
improve  the  results,  such  as  some  spatial  localization  of 
the  similarity  weights  and  brightness  invariance.  The 
latter  is  achieved  by  introducing  a  multiplicative  con¬ 
stant  that  matches  the  mean  illuminance  between  each 
pair  of  patches. 

The  patch  NL-means  algorithm  is  also  related  to  the 
manifold  image  models  of  [49].  Eq.  (9)  can  be  split  into 
two  steps  which  are  analog  to  the  manifold  and  image 
projection  steps  used  in  [49].  First,  for  each  patch  cen¬ 
tered  in  O  we  compute  a  new  patch  as  a  weighted  aver¬ 
age  of  all  patches  in  the  complement,  according  to  the 
patch  similarity  weights  p^s(z)  :=  f^c  w(z,  z)pu(z)dz 
with  z  e  O.  Doing  this  for  each  hole  position  yields  an 
incoherent  ensemble  of  patches.  The  image  is  obtained 
by  averaging  these  patches:  u{z)  =  A(^  y  fQ  p^S{z  — 
y,  y)dy.  We  use  a  density  model,  instead  of  the  manifold 
model  of  [49].  Indeed,  p^fs(x)  is  the  mean  shift  opera¬ 
tor  applied  to  pu(x).  It  is  known  that  the  iteration  of 
this  operator  corresponds  to  an  adaptive  gradient  as¬ 
cent  of  the  Parzen  estimate  of  a  PDF  [21],  which  in 
this  case  is  generated  by  the  set  of  patches  in  the  com¬ 
plement  of  the  hole.  The  use  of  a  density  model  entails 
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some  advantages,  mainly  from  the  computational  point 
of  view,  learning  a  manifold  model  is  computationally 
costly.  Furthermore,  the  assumption  that  patches  lie  on 
a  manifold  is  questionable  (one  could  think  for  instance 
in  a  stratification  as  a  more  realistic  model),  and  its  di¬ 
mension  is  hard  to  determine  for  real  images. 

In  the  following  we  will  comment  on  the  relation  of 
this  model  with  recent  works  on  non-local  regulariza¬ 
tion. 

The  UINTA  algorithm,  presented  in  [5]  is  a  non¬ 
local  denoising  algorithm  that  minimizes  the  entropy 
of  the  patches  in  the  image.  Casting  this  idea  to  the 
context  of  inpainting  the  UINTA’s  entropy  is  estimated 
as  the  sample  mean 


Eu(u )  =  -  /  log 
Jo 


J~exp(-^\\pu(x)  -pu{x) ||2)df 


dx 


where  the  inner  integral  is  the  probability  of  occurrence 
of  the  patch  pu(x)  obtained  as  a  Parzen  density  esti¬ 
mate.  The  corresponding  Euler-Lagrange  equation  can 
be  solved  with  a  fixed  point  iteration  which  coincides 
with  the  patch  NL-means  scheme  (9).  In  [5]  this  en¬ 
ergy  is  minimized  by  considering  all  patches  as  inde¬ 
pendent  (disregarding  the  overlap  between  neighboring 
patches),  and  evolving  each  of  them  according  to  a  gra¬ 
dient  descent  of  Ejj .  After  this,  an  image  is  formed  with 
the  centers  of  these  new  patches.  The  repetition  of  this 
process  results  in  an  iterative  application  of  pixel-wise 
NL-means. 

In  [13]  the  authors  use  a  variational  principle  for 
deriving  the  iterated  pixel  NL-means  regularizer,  and 
show  its  application  to  the  restoration  of  texture.  The 
underlying  energy  corresponds  to  the  quadratic  penalty 
between  the  solution  image  u ,  and  a  pixel  NL-means 
type  average  of  the  noisy  input  image  u.  The  weights  for 
this  average  are  computed  using  u.  Due  to  the  depen¬ 
dence  of  the  weights  with  the  regularized  image  u,  the 
minimizer  is  no  longer  a  weighted  average  as  NL-means, 
but  the  solution  of  a  nonlinear  optimization  problem. 
It  is  shown  that  if  the  derivative  of  the  nonlinear  com¬ 
ponent  is  neglected,  the  resulting  Euler-Lagrange  equa¬ 
tion  matches  the  proposed  fixed  point  algorithm:  The 
iterated  NL-means  regularizer. 

In  [52]  the  authors  presented  a  variational  frame¬ 
work  for  image  denoising  consisting  in  non-local  regu¬ 
larization  and  data  adjustment  terms.  Inpainting  could 
be  performed  by  considering  only  the  data  term  as  fol¬ 
lows: 

EP{u)  =  -  exp(e(pu(x)  -  pa(x)))dxdx 

Jo  Joc 

This  energy  is  the  same  as  the  one  adapted  from  the 
UINTA  algorithm  Ejj ,  without  the  logarithm.  In  [52] 


the  Euler-Lagrange  equation  is  solved  with  a  fixed  point 
iteration.  This  model  has  two  differences  with  our  frame¬ 
work.  First  it  allows  to  use  a  more  general  nonlinearity 
for  the  computation  of  the  weights  other  than  the  expo¬ 
nential.  Second,  even  in  the  case  of  the  exponential,  the 
methods  differ  in  the  normalization,  for  instance,  when 
£  is  the  squared  L2-norm,  the  resulting  scheme  is  as  the 
patch  NL-means,  with  the  unnormalized  weights. 

After  its  introduction  in  [3],  our  model  has  been 
interpreted  as  a  non-local  self-similarity  regularizer  in 
[51],  where  in  conjunction  with  appropriate  data  fit¬ 
ting  terms  it  has  been  applied  to  the  solution  of  in¬ 
verse  problems,  including  inpainting,  super-resolution 
and  compressive  sensing.  In  [51]  a  different  patch-error 
function  e  is  used,  namely  the  L2-norm  between  patches 
(without  squaring  it).  This  choice  is  motivated  as  a 
patch-wise  version  of  their  work  [50]  on  non-local  To¬ 
tal  Variation  [32,42,57]  with  adaptive  weights.  This 
patch- wise  non-local  TV  is  defined  as  the  id-norm  of 
the  non-local  gradient  of  the  patch  valued  image  pu  : 
Q  — >  .  The  non-local  gradient  is  defined  as  a  func¬ 

tion  Vwpu  :  Q  x  Q  — >  given  by  \7wpu(x,x)  = 
w(x,x)(pu(x)  —pu{x)).  Thus,  the  patch-wise  non-local 
TV  reads 


IIWpu||  :=  f  f 

Ji 1  JQ 


w{x,x)\\pu(x)  -  pu(x)\\2dxdx. 


Note  that  in  this  sense,  the  model  of  the  patch-NL  me¬ 
dians  corresponds  an  anisotropic  version  of  the  non¬ 
local  TV  where  the  2-norm  in  the  integral  is  replaced 
by  the  1-norm.  Our  work  and  the  work  of  [51]  are  com¬ 
plementary.  In  [51]  the  regularization  term  is  fixed,  and 
the  authors  focus  on  the  possibilities  given  by  differ¬ 
ent  data  terms  suited  for  different  applications.  On  the 
other  hand  in  this  work  we  focus  on  the  regularization 
term  exploring  its  properties  with  different  patch  error 
functions  £,  and  applying  them  to  a  problem  in  which 
the  data  term  plays  no  role  at  all,  since  there  is  no  data 
to  adjust  to. 


4  Multiscale  scheme 

Exemplar-based  inpainting  methods  show  a  critical  de¬ 
pendence  with  the  size  of  the  patch.  In  Figure  7,  we 
show  completions  obtained  with  patch  NL-means  using 
different  patch  sizes:  Two  results  with  a  small  patch 
(a  =  4)  and  one  result  with  a  large  patch  (a  =  19).  The 
latter  is  able  to  reproduce  the  periodic  pattern  of  the 
lamps,  but  the  completion  is  blurry  due  to  the  spacial 
overlap  of  the  patches  and  presents  many  discontinu¬ 
ities  at  the  boundary  of  the  hole. 

The  results  with  the  small  patch  do  not  show  these 
artifacts,  but  one  of  them  has  failed  to  reproduce  the 
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Fig.  7  Mitama  Matsuri.  Left  column:  inpainting  domain  and  initial  condition.  For  the  rest  of  the  columns,  from  left  to  right  single 
scale  inpainting  with  a  9  X  9  patch  with  a  =  4,  single  scale  inpainting  with  a  43  x  43  patch  with  a  =  19  and  multi-scale  inpainting,  with 
S  =  2,  covering  a  patch  size  from  9x9  with  a  =  4  to  43  X  43  with  a  =  19.  All  results  have  been  computed  with  the  patch  NL-means 
scheme.  The  bottom  row  shows  the  regions  in  which  the  nearest  neighbor  offset  is  constant. 


lamps.  The  only  difference  between  both  is  the  initial¬ 
ization.  One  of  them  was  initialized  with  the  original 
image  shown  in  the  bottom  left,  whereas  the  other  one 
with  the  result  obtained  with  the  multiscale  approach 
described  in  this  section. 

Figure  8  shows  a  synthetic  problem  with  similar 
characteristics.  The  result  with  a  large  patch  recovers 
the  shape  of  the  incomplete  rectangle,  but  does  not  con¬ 
tinue  its  texture  correctly.  Three  different  small  scale 
results  are  shown.  They  are  all  minima  of  the  energy 
with  minimal  energy,  E  =  0.  Patches  are  too  small  to 
capture  that  these  completions  are  different.  Which  one 
is  obtained  will  depend  on  the  initialization. 

As  done  by  almost  all  state  of  the  art  exemplar 
based  inpainting  methods  (e.g.  [38,39,56]),  we  will  in¬ 
corporate  a  multiscale  scheme,  described  below.  This  is 
usually  motivated  as  an  heuristic  to  avoid  local  min¬ 
ima,  to  find  a  good  initialization  and/or  to  alleviate 
the  computational  cost.  We  believe  however  (following 
[34])  that,  as  these  examples  suggest,  inpainting  is  in¬ 
herently  a  multiscale  problem:  Images  have  structures 
of  different  sizes,  ranging  from  large  objects  to  fine  scale 
textures  and  edges.  The  multiscale  scheme  responds  to 
the  fact  that  several  patch  sizes  are  needed  to  reproduce 
all  these  structures  properly. 


4.1  Multiscale  scheme 

In  the  following  we  describe  the  multiscale  method  we 
adopted  in  this  work,  which  goes  along  the  lines  of  what 
is  customary  in  the  literature  [30,38,56].  It  consists  on 
sequentially  applying  the  inpainting  scheme  on  a  Gaus¬ 
sian  image  pyramid,  starting  from  the  coarsest  scale. 
The  result  at  each  scale  is  upsampled  and  used  as  ini¬ 


tialization  for  the  next  finer  scale.  The  patch  size  is 
constant  through  scales. 

Let  us  consider  S  scales,  the  finest  denoted  with 
s  =  0.  We  will  specify  the  size  of  the  image  at  the 
coarsest  level  As- 1-  Denoting  the  size  of  the  image  at 
the  finest  scale  by  Ao,  we  compute  the  sampling  rate 
as  r  :=  (Ao/As-i)1^3-1^  E  (0,1).  The  width  of  the 
Gaussian  filtering  is  associated  to  the  subsampling  fac¬ 
tor  as  in  [46].  Let  ao  be  the  size  of  the  patch  and  Eao 
the  corresponding  energy.  We  will  add  the  superindex 
s  =  0, . . . ,  S  —  1  to  the  variables  u  and  w  to  denote 
the  scale.  As  before,  the  subindex  0  refers  the  initial 
condition,  i.e.  Uq  is  the  initial  condition  at  scale  8. 


Algorithm  2  Multiscale  scheme. 

Require:  u3  ,  S',  ao  and  As- 1 
l:  Initialize:  ('a*9-1,  ws_1)  =  argmin^ Eao  (u,  w) 
2:  for  each  scale  s  =  S  —  2, . . . ,  0  do 
3:  Upsample  usJrl  to  obtain  Uq 

4:  ( us ,ws)  =  arg  min(u jW)  Eao  (u,w) 

5:  end  for 


The  upsampling  from  s  +  1  to  s  is  obtained  as  in 
[56].  The  coarse  weights  res+1  are  first  interpolated  to 
the  finer  image  size,  yielding  Wq.  These  weights  are 
then  used  to  solve  an  image  update  step  at  the  new 
scale:  =  minu  Eao  (u,  Wq).  More  conventional  up¬ 

sampling  schemes  by  local  interpolation  (such  as  bilin¬ 
ear  or  splines)  introduce  a  bias  towards  low-frequency 
non-textured  regions.  This  exemplar-based  upsampling 
avoids  this  bias. 

Notice  that  keeping  the  patch  size  constant  while 
filtering  and  reducing  the  image,  is  almost  equivalent 
to  enlarging  the  patch  domain  and  filtering  an  image 
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Fig.  8  Single  scale  inpainting.  Synthetic  inpainting  problem  (the  red  rectangle  is  the  inpainting  domain)  and  several  local  minima 
of  a  single  scale  inpainting  energy  with  a  small  patch. 


of  constant  size.  The  process  can  thus  be  seen  as  the 
sequential  minimization  of  a  series  of  inpainting  en¬ 
ergies  with  varying  patch  size  given  by  as  =  l/rsao, 
s  =  0, . . ,  5  S  —  1,  over  a  corresponding  series  of  filtered 
images. 

The  scheme  starts  by  the  coarsest  scale  5  —  1,  in 
which  a  larger  portion  of  the  inpainting  domain  is  cov¬ 
ered  by  partially  known  patches,  which  makes  the  in¬ 
painting  task  easier,  and  less  dependent  on  the  initial¬ 
ization.  The  energy  at  this  scale  should  have  fewer  local 
minima.  The  dependency  of  the  minimization  process 
on  the  initial  condition  ensures  also  that  each  single 
scale  solution  corresponds  to  a  local  minimum  close  to 
the  coarse  scale  initialization.  Therefore  the  multiscale 
algorithm  exploits  this  dependency  to  obtain  an  image 
vP  which  is  approximately  self-similar  for  all  scales  (or 
equivalently,  for  all  patch  sizes). 

Figure  7  shows  a  comparison  between  single  and 
multiscale  results  with  the  patch  NL-means  scheme. 
As  expected  the  multiscale  result  shows  the  benefits 
of  large  and  small  patch  sizes.  The  missing  lamps  have 
been  completed  with  the  correct  shape  and  spacing  by 
the  coarser  stages,  and  the  fine  details  are  also  taken 
care  off  by  the  finer  scales.  The  bottom  row  shows  the 
regions  of  constant  NNF.  The  single  scale  result  with 
a  large  patch  shows  an  NNF  partitioned  in  relatively 
large  regions,  showing  that  with  a  large  patch  size  the 
copying  is  more  rigid.  The  result  with  a  small  patch 
size  shows  that  the  synthesis  has  been  performed  based 
on  small  regions.  The  multiscale’s  NNF  shows  an  in¬ 
termediate  partition,  with  some  large  regions  inside  of 
the  hole  and  a  smaller  ones  around  its  boundary.  The 
finer  inpaintings  work  by  refining  the  coarse  partition 
obtained  at  coarser  scales. 


5  Experimental  results 

In  this  section  we  further  demonstrate  the  performance 
of  the  proposed  schemes  on  real  inpainting  problems. 
The  images  used  were  obtained  from  Komodakis  and 
Tziritas  [39]  and  from  the  100  images  benchmark  pro¬ 
posed  by  Kawai  et  al.  [38] ,  available  at  http :  / / yokoya . 
naist.jp/research/inpainting/.  Due  to  the  lack  of 
space  we  focus  on  the  methods  presented  in  this  work. 
Please  refer  to  [38,39]  for  a  comparison  with  their  ap¬ 
proaches. 

5.1  Experimental  setting 

We  consider  four  inpainting  methods,  namely  patch 
NL-means,  -medians,  -Poisson  and  -GM.  Both  gradient 
based  methods  are  always  combined  with  patch  NL- 
means  with  mixing  parameter  A,  as  in  (14).  In  all  cases 
we  use  the  multiscale  approach.  As  stated  before,  to 
prevent  blurring  we  set  ft,  the  selectivity  of  the  similar¬ 
ity  weights  re,  to  h  — >  0.  In  this  case,  the  weights  select 
the  nearest  neighbors  of  each  patch  in  O.  We  show  re¬ 
sults  with  the  CIE  La*b*  color  space. 

The  calculation  of  the  weights  dominates  the  com¬ 
putational  load  of  the  algorithms.  With  an  exhaustive 
search  for  the  exact  nearest  neighbor,  the  cost  of  each  it¬ 
eration  is  0(A(0)  x  Al(Oc)  x  s2).  However,  a  significant 
speed-up  can  be  obtained  with  approximate  searches, 
almost  without  any  noticeable  decrement  in  the  quality 
of  the  results  (see  [13]  and  references  therein). 

In  our  implementation  we  use  a  modified  version 
of  the  PatchMatch  algorithm  introduced  in  [7].  Patch- 
Match  is  an  iterative  algorithm  that  estimates  the  NNF 
jointly  for  all  patches  in  the  inpainting  domain  by  ex¬ 
ploiting  the  coherence  of  natural  images.  The  modifi- 
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cation  we  implemented  allows  the  estimation  of  a  lists 
of  the  L  first  nearest  neighbors  of  each  patch.  The  al¬ 
gorithm  has  a  computational  cost  of  0(A(0)  x  s2  x  L) 
per  iteration.  Typically  between  5  and  10  iterations  are 
sufficient  to  obtain  results  comparable  with  the  exhaus¬ 
tive  search  algorithm  when  using  lists  of  size  L  ~  10. 
In  Appendix  C  we  describe  the  modified  PatchMatch. 

The  image  update  step  is  fast  for  the  patch  NL- 
means,  -median  and  -Poisson,  but  can  be  time  consum¬ 
ing  for  patch  NL-GM,  in  particular  for  small  values  of 
the  mixing  parameter  A.  See  Appendix  B  for  more  de¬ 
tails. 

The  results  are  shown  in  figures  9,  10  and  11.  The 
first  column  shows  the  initialization.  The  rest  of  the 
columns  show  the  results  obtained  with  patch  NL-means, 
-medians,  -Poisson  and  -GM  in  that  order.  Obtaining 
good  results  requires  fixing  the  following  parameters: 

Patch  size.  For  almost  all  experiments  we  used  patches 
of  size  s  between  3x3  and  9x9.  We  did  not  use 
Gaussian  intra-patch  weights  to  shape  the  patch, 
since  they  need  a  larger  support,  which  implies  less 
available  exemplars. 

Multiscale  parameters.  The  multiscale  scheme  has  two 
parameters:  the  size  of  the  coarsest  image  As- 1 
and  the  number  of  scales  S.  From  these,  As- 1  is 
the  most  critical  parameter.  Smaller  images  with 
smaller  inpainting  domains  are  easier  to  complete. 
Care  must  be  taken,  however,  since  small  images 
also  imply  less  available  exemplars  to  copy  from.  In 
these  experiments  As- 1  was  set  to  a  20%  of  the 
original  size  when  possible.  Some  cases  required  less 
subsampling.  The  number  of  scales  S  was  set  such 
that  the  subsampling  rate  r  =  ( Ao/As-i « 
(1/2)1/3  «  0.8  as  in  [56]. 

Confidence  mask.  The  confidence  mask  has  two  param¬ 
eters,  the  asymptotic  value  Co  and  the  decay  time  tc. 
For  all  experiments  we  fix  Co  =  0.1  and  used  a  decay 
time  tc  =  5  except  for  small  inpainting  domains,  in 
which  we  set  tc  —  1. 

For  the  mixing  coefficient  A  of  gradient-based  meth¬ 
ods  we  tested  two  configurations:  Low-A  corresponding 
to  A  =  0.01  and  A  =  0.001  for  patch-NL  Poisson  and  - 
GM,  and  high- A  corresponding  to  A  =  0.1  and  A  =  0.01 
for  patch-NL  Poisson  and  -GM.  Recall  that  lower  val¬ 
ues  of  A  give  a  higher  weight  to  the  gradient  compo¬ 
nent  of  the  energy.  This  is  appropriate  for  structured 
images  with  strong  edges.  In  almost  each  row  of  figures 
9,  10  and  11  we  used  the  same  A  configuration  for  both 
gradient-based  schemes. 

The  rest  of  the  parameters  are  also  the  same  for 
each  row  in  figures  10  and  11.  Instead,  for  many  images 
in  Fig.  9  we  used  different  parameters  for  intensity-  and 


gradient-based  methods.  In  these  cases,  image-based 
methods  required  larger  patches  than  gradient-based 
ones. 


5.2  Observations  and  comments 

Gradient  vs.  intensity.  Gradient-based  methods  show  a 
better  reconstruction  of  structure  (Fig.  9)  and  periodic 
patterns  (Fig.  11).  Intensity-based  methods  present  dis¬ 
continuities  at  the  boundary  of  the  hole  and  between 
copy  fronts.  Furthermore,  gradient-based  methods  facil¬ 
itate  the  prolongation  of  structures  and  edges,  due  to 
the  reinforcement  of  local  PDE  diffusion  and  non-local 
propagation  (see  Section  2.2).  This  allows  to  use  smaller 
patches,  which  alleviates  the  computational  load  and 
reflects  in  more  available  exemplars  and  less  blending 
due  to  patch  overlap  (either  by  averages  or  medians). 

On  the  other  hand,  gradient-based  methods  show 
problems  with  random  textures  (Fig.  10)  and  in  some 
occasions  between  copy  fronts.  In  these  cases,  the  near¬ 
est  neighbors  of  overlapping  patches  disagree  on  the 
gradient  of  a  certain  pixel.  This  may  result  in  the  atten¬ 
uation  or  the  omission  of  gradients,  causing  a  “spilling 
effect”,  particularly  for  the  patch  NL-Poisson  [e.g.  rows 

1.3  and  4  of  Fig.  9). 

Some  failures  of  gradient-based  methods  on  random 
textures  result  from  the  use  of  gradients  in  the  patch  er¬ 
ror  function,  as  in  rows  4  and  6  of  Fig.  10.  In  the  former, 
for  instance,  segments  of  the  sky  have  been  reproduced 
in  the  snow.  Fig.  12  shows  results  with  an  inpainting 
scheme  in  which  the  weights  are  computed  based  only 
on  the  image  values  (with  the  squared  L2-norm),  and 
the  image  is  updated  using  patch  NL-Poisson  or  -GM 
with  a  low  value  of  A.  Such  scheme  is  non- variational 
and  its  convergence  its  not  guaranteed. 

We  should  point  out  that  in  many  other  cases,  con¬ 
sidering  gradients  in  the  comparison  criterion  improves 
the  results.  See  for  instance  rows  2  and  4  of  Fig.  11. 

Means  vs.  medians.  It  is  notorious  that  L1 -based  func¬ 
tionals  perform  better  at  the  reproduction  of  fine  tex¬ 
ture.  The  results  of  the  L2  methods  are  smoothed  by 
the  spatial  averaging  of  overlapping  patches.  On  the 
other  hand,  patch  NL-medians  creates  sharp  disconti¬ 
nuities  as  in  Fig.  2  when  different  copy  fronts  meet  {e.g. 
rows  1,3,5  and  7  in  Fig.  9).  These  discontinuities  are 
very  noticeable  and  in  these  cases  some  smoothing  is 
desirable.  For  the  patch  NL-GM  method,  discontinu¬ 
ities  between  copy  fronts  occur  at  the  gradient  level, 
and  are  eliminated  after  its  integration.  The  fact  that 
the  gradient  is  estimated  robustly  helps  in  avoiding  the 
spilling  effect  of  patch  NL-Poisson. 
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Fig.  9  Results  on  structured  images.  From  left  to  right:  Original  image  and  mask,  patch  NL-means,  -medians,  -Poisson  and  -GM. 
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Fig.  10  Results  on  random  textures.  From  left  to  right:  Original  image  and  mask,  patch  NL-means,  -medians,  -Poisson  and  -GM. 
All  images  in  each  row  have  been  generated  with  the  same  parameters. 


We  have  observed  that  L 1  methods  are  more  reluc¬ 
tant  to  make  changes  during  the  minimization  process. 
The  same  robustness  that  allows  a  better  performance 
with  textures  and  avoids  the  spilling  effect  makes  them 
more  greedy.  Once  a  set  of  neighboring  patches  have 
settled  on  a  locally  stable  solution  (typically  a  region 
of  constant  NNF ) ,  it  is  hard  for  the  algorithm  to  change 
that  local  decision.  To  understand  why,  imagine  a  copy 


front  advancing  from  the  boundary  carrying  correct  in¬ 
formation  which  meets  an  already  settled  copy  front, 
which  has  taken  an  undesirable  decision  based  on  a 
bad  initialization.  Pixels  on  the  boundary  of  the  mis¬ 
taken  front  start  receiving  contributions  from  patches  in 
the  advancing  front.  Initially  these  contributions  will  be 
outliers  in  the  distribution  from  which  the  pixel  value  is 
estimated.  The  median  will  discard  these  outliers,  and 
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Fig.  11  Results  on  periodic  textures.  From  left  to  right:  Original  image  and  mask,  patch  NL-means,  -medians,  -Poisson  and 
-GM.  All  images  in  each  row  have  been  generated  with  the  same  parameters. 


the  pixel  value  will  not  change  unless  of  course,  patches 
in  the  advancing  front  outnumber  the  mistaken  ones. 

Although  the  confidence  mask  diminishes  this  ef¬ 
fect,  iA-based  methods  (specially  patch  NL-medians) 
still  show  more  dependence  on  the  initialization.  A  re¬ 
sult  of  this  are  some  misalignments  in  straight  lines, 
due  to  subsampling  artifacts  of  the  multiscale  scheme. 
Also  patch  NL-medians  generally  requires  the  use  of 
larger  patches,  particularly  for  structured  images.  This 
is  not  always  possible,  as  in  row  6  of  Fig.  9,  where  we 
could  not  find  a  proper  set  of  parameters.  However  we 
found  a  good  result  by  after  3  iterations  of  the  multi¬ 
scale  scheme. 


The  focus  of  this  work  lies  more  on  introducing  and 
exploring  the  variational  framework  than  in  presenting 
a  single  inpainting  algorithm.  Still,  we  would  like  to 
comment  on  which  would  be  the  best  method  among 
the  ones  presented.  Based  on  the  previous  observations, 
it  is  clear  that  the  answer  to  this  question  will  depend 
on  the  characteristics  of  the  inpainting  problem.  How¬ 
ever,  a  priori ,  the  patch  NL-Poisson  seems  a  reasonable 
compromise  between  quality  and  computational  cost.  It 
is  also  able  to  provide  good  results  for  a  wider  range  of 
parameters.  For  cases  in  which  an  accurate  reproduc¬ 
tion  of  random  textures  is  important,  patch  NL-GM 
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Fig.  12  Results  with  a  non  variational  algorithm.  The 

gradient  methods  sometimes  introduce  blurry  areas  in  the  solu¬ 
tion.  Here  we  show  the  results  using  a  non  variational  version 
of  the  patch  NL-Poisson  (left  column)  and  patch  NL-GM  (right 
column)  for  some  of  those  images. 


may  be  a  better  option,  however  at  a  higher  computa¬ 
tional  cost. 

6  Conclusions  and  future  work 

In  this  work  we  presented  a  variational  framework  for 
exemplar-based  non-local  image  inpainting.  The  pro¬ 
posed  energy  lends  itself  to  intuitive  interpretations  in 
terms  of  probabilistic  models  and  has  connections  with 
mean  shift  and  statistical  mechanics.  We  minimize  it 
using  a  coordinate  descent  algorithm,  alternating  be¬ 
tween  similarity  weights  and  the  image  updates.  These 
processes  are  coupled  through  the  patch  error  function, 
i.e.  the  criterion  used  to  compare  patches. 

We  derived  from  this  framework  four  different  in¬ 
painting  schemes,  corresponding  to  error  functions  based 
on  the  combinations  of  L1-  and  L2-norms,  with  image 
or  gradient  patches.  After  testing  these  schemes  on  syn¬ 
thetic  and  real  problems  we  arrived  to  the  following 
conclusions. 

1.  The  robustness  of  the  iA-norm  (patch  NL-medians 
and  NL-GM)  makes  it  more  appropriate  for  the  re¬ 


production  of  textures.  These  schemes  are  however, 
more  greedy,  or  “stiff,”  and  therefore  more  depen¬ 
dent  on  the  initialization.  Conversely  the  squared 
L2- norm  introduces  some  blur,  and  show  less  stiff¬ 
ness. 

2.  When  gradients  are  considered  in  the  patch  error 
function,  the  image  update  step  is  computed  as  the 
solution  of  a  (local)  diffusion  PDE,  driven  by  a  non- 
locally  estimated  gradient  field.  This  improves  the 
synthesis  in  terms  of  continuity  at  the  boundary  and 
attenuation  of  seams.  Also  the  local  interpolation 
helps  in  the  propagation  of  structures,  such  as  edges 
at  the  boundary  of  the  hole. 

The  proposed  functional  shows  a  critical  dependence 
with  the  patch  size.  Furthermore,  it  is  nonconvex  and 
has  many  local  minima,  particularly  for  small  patch 
sizes.  To  tackle  these  issues  we  use  a  multiscale  ap¬ 
proach.  This  is  customary  in  the  literature,  usually  mo¬ 
tivated  as  a  way  to  obtain  a  good  initialization  and 
for  computational  reasons.  We  believe  however  that  in¬ 
painting  is  inherently  a  multiscale  problem,  and  that 
underlying  the  multiscale  approach  lies  an  inpainting 
criterion.  We  are  currently  working  on  a  variational  for¬ 
mulation  of  multiscale  inpainting.  Other  topic  of  cur¬ 
rent  research  is  the  use  of  other  patch  error  functions 
based  on  the  comparison  of  structure  tensors,  which 
could  provide  a  more  robust  estimation  of  the  morpho¬ 
logical  structure  of  the  image. 

A  Existence  of  minima  for  the  functional  (2) 

Let  us  consider  the  model 

E{u,w)=  \  f  f  w(x,x)\\pu(x)  -  pu(x)\\2dxdx+  (17) 

h  Jo  Joc 

+  w(x,x)\ogw(x,x)dxdx. 

Jo  Joc 

In  this  section  we  consider  the  squared  L 2  patch  error  function 

I|P«(*)-P«(*)H2=  f  9a(y){u(x  +  y)-u(x  +  y))2dy 
J  f2p 

where  ga  denotes  a  smooth  integrable  kernel  whose  gradient  is 
also  integrable.  Let 

V  :=  {w  :  O  X  Oc  — ►  M  :  w(x,  x)  dx  =  1  a.e.  x  E  O}. 

J  Oc 

Let  us  consider  the  admissible  class  of  functions 

Am  :=  {{u,  w)  :  u  =  u  in  Oc,  \u(x)\  <  M  w  E  V}. 

Proposition  1  Assume  that  u  E  BV(Oc).  There  exists  a  solu¬ 
tion  of  the  variational  problem 

min  E(u,w).  (18) 

(■ u,w)eAM 
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Proof  Let  {un,wn)  be  a  minimizing  sequence  of  (18),  i.e.  a  se¬ 
quence  such  that  E(un,wn)  — ►  ^(u,w)eAm  E(u,w).  Since  ft  is 
a  bounded  domain  we  have  that 

io  «/oc 

is  bounded.  Hence  wn{l  +  log+  wn )  is  bounded  in  L1.  Then  the 
sequence  wn  is  relatively  weakly  compact  in  L1  and  by  extract¬ 
ing  a  subsequence,  if  necessary,  we  may  assume  that  wn  weakly 
converges  in  L1(0  x  Oc)  to  some  w  E  V.  Let  us  now  prove  that 


x+x  [  9a(y)(u(x  +  y)-u(x  +  y ))5 

J  ft  r) 


/ft. 

and 


-if  9a(y)(u(x  +  y)-u(x  +  y)y 

J  On 


d  y 


d  y 


are  uniformly  bounded  when  |u|  <  M.  Indeed,  the  boundedness 
of  the  first  integral  is  a  consequence  of 

^ x-\-x  /  9a(y)(u(x  +y)~  u(x  +  y))2dy 

J  ftp 

=  2  /  ga{y)(u(x  +  y)  -  u(x  +  y)) 

J  ftp 

(Vxu(x  +  y)~  ^Xu(x  +  y))dy 

=  2  ga(y)(u(x  +  y)  -  u(x  +  y)) 

J  ftp 

(Vvu(x  +  y)  -  \7yu(x  +  y))dy 

=  ~  [  Vvga(y)(u(x  +y)~  u(x  +  y))2dy, 

J  ftp 

the  L 1  integrability  of  Vyga(y)  and  the  boundedness  of  u.  We  are 
also  assumming  that  ga  vanishes  at  the  boundary  of  ftp.  Now, 

-X  [  ga(y)(u(x  +  y)-u(x  +  y))2dy 
J  ftrj 

W  9a(y)(u(x  T  y)  -  u(x  +  y)) 

J  ftn 

(Vxu(x  +  y)  +  Vxu(x  +  y))dy 
W  9a(y)(u(x  +  y)  -  u(x  +  y)) 

J  ftp 

(yyu(x  +  y)  +  Vyu(x  +  y))dy 

=  9a(y)(yvu(x  +  y)2  -  Vyu(x  +  y)2)dy 

J  ftp 

+2  /  ga{y)(u(x  +  y)Vyu(x  +  y)  -  u(x  +  y)V yu(x  +  y))dy 

J  ftn 


=  2 

=  2 


Observe  that 


/  9a(y)(u(x  +  y)X7yu(x  +  y)  -  u(x  +  y)X7yu(x  +  y))dy 

J  ft  rn 


j 

J  ftn 


Vyga(y)u(x  +  y)dy 


=  /  Vyga(y)(u(x  +  y)  +  l)u(x  +  y)dy 

J  ftp 

+2  /  ga(y)'Vyu(x  +  y)u(x  +  y)dy  -  /  Vyga(y)u(x  +  y)dy. 

£~2  p  p 

From  our  assumptions,  the  boundedness  of  the  previous  integral 
holds.  Thus,  the  functions 

Vx  [  9a{y){u{x  +  3 /)  -  u(x  +  y))2dy 

J  ftn 


I  ft, 
and 


[  9a{y){u{x  +  y)-u{x  +  y))- 

J  ft  ,n 


dy 


are  uniformly  bounded  when  \u\  <  M .  Thus,  modulo  the  extrac¬ 
tion  of  a  subsequence,  we  may  assume  that  un  — ►  u  weakly  in 
all  Lp,  1  <  p  <  Too  and  ga  *  ( un{x  T  •)  —  un(x  T  -))2  converges 
strongly  in  all  Lp  spaces  and  also  in  the  dual  of  L  Log+  L  to  some 
function  F.  Then  by  passing  to  the  limit  as  n  — >  oo  we  have 

—  f  f  w(x,x)F2  T  [  f  w(x,  x)  log  w(x,  x)dxdx 

h  Jo  Joc  Jo  Joc 

<  liminf  E(un,wn)- 

n 

Taking  test  functions  ip(x,x),  integrating  in  O  X  Oc  and  using 
the  convexity  of  the  square  function,  we  have 

/  9a(y)(u(x  T  y)  ~  u(x  T  y))2dy  <  F. 

J  ftp 

Then  E(u:  w)  <  liminf n  E(un:  wn).  □ 

B  Algorithm  for  solving  Equation  (13) 

Following  [2, 18]  we  derive  here  a  fixed  point  algorithm  for  a  dis¬ 
crete  version  of  (13). 

Discrete  formulation.  We  consider  discrete  images  defined  over  a 
rectangular  bounded  domain  ft  =  {0, ... ,  TV}2.  We  are  given  an 
inpainting  domain  O  C  ft  and  the  known  portion  of  an  image 
u  :  Oc  — ►  M.  Let  us  define  Oe  as  the  set  of  pixels  with  at  least 
one  neighbor  in  O  (according  to  the  4-connectivity).  If  we  think 
as  the  discrete  image  as  a  lattice  of  nodes  joined  by  edges,  all 
variable  edges  are  between  nodes  in  Oe.  We  will  use  the  notation 
[A  — >•  B\  =  {/  :  A  — ►  B },  the  set  of  functions  from  A  to  B. 

We  define  V  :  [ft  [ft  ^  M2]  as 


0 


if  %  =  IV, 


and  similarly  for  the  component  Vu2^-.  The  notation  z  —  ( i,j ) 
refers  to  for  locations  on  the  image.  Let  us  also  define  a  divergence 
operator  V-  :  [ft  — ►  R2]  — >•  [ft  — >  M], 


u(x  T  y) 


/  9a(y)^y  i  /  /-  \  i  i  \ 

Jftp  \(u(x  +  y)  + 1) 

-  /  Vyga(y)u(x  +  y)dy 

J  ftp 

[  'Vy(ga(y)(u(x +  y)  +  l)2) - —  +  — 

J  ft  ,n 


/  Ty)dy 

:=  1 

r  ^  *  =  o,  ( 

-Pi-i, j  if  *  =  iV,  +<j 

J  ftp 

1 

otherwise,  [ 

Pi. 


ph- 


if  j  =  0, 
if  j  =  TV, 


(u(x  T  y)  T  l)2dy 


dy 


(u(x  T  y)  T  1) 


These  operators  incorporate  Neumann  boundary  conditions  on 
the  boundary  of  ft,  and  are  dual  operators,  i.e.  denoting  (•,•) 
the  usual  scalar  products  in  [ft  — >  R2]  and  [ft  — >  M2],  (Vu,p)  = 
—  (it,  V  •  p) ,  for  all  u  G  [ft  — ►  R] ,  p  G  [ft  — >•  R2] . 

Let  us  also  define  U  :=  [Oe  — >•  R]  and  V  :=  [Oe  — ►  M2]. 
In  these  spaces  we  will  consider  the  usual  scalar  products  and 
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will  denote  them  as  (•,•)[/  and  Notice  that,  with  these 

definitions,  the  gradient  and  divergence  are  not  dual  operators 
when  restricted  to  Oe,  i.e.  in  general  (n,  V  •  p)u  /  —  (Vi i,p)v- 
We  are  looking  for  a  completion  u  G  U  :=  {u  G  U  :  uz  = 
uz ,  \/z  G  Oc}  as  the  solution  of  the  following  problem: 

mij  E  E  rnzS\\\'uz  -  vs\\  +  -h  Y  cz(uz -}z)2,  (19) 

ueu  ze oezeD  zot  zeo 

where  v  :  D  — >  M2 ,  m  :  Oe  X  D  — ►  M+  U  {0},  /  :  O  — ►  M  and  c  : 
Oe  — ►  are  given.  For  the  sake  of  generality  we  consider  a 

generic  domain  D  instead  of  the  complement  OJ.  This  generaliza¬ 
tion  is  relevant,  due  for  instance  to  computational  considerations 
as  will  be  discussed  shortly.  Let  us  remark  that,  since  cz  >  0  for 
z  =6  Oe,  the  energy  is  convex  and  the  minimum  is  unique. 

For  simplicity  we  extend  /  over  Oe  by  defining  fz  =  uz  for 
z  G  0e\0.  We  use  the  fact  that  for  any  p  G  M2,  ||?7||  =  sup||e||<1  p- 
s,  and  write  the  energy  as: 


where  v  is  a  time  step  small  enough  for  assuring  the  convergence 
of  the  fixed  point  iteration.  Lastly  the  solution  to  the  primal 
problem  is  recovered  with  (20). 

Convergence.  Let  us  first  observe  that  V fz  —  V fz  =  V[(l  — 
X?)fz]  =  V[x?e^°/z]  for  all  z  G  4?,  which  allows  to  express 
the  dual  energy  (21)  as 

J*(e)  =  jWc-V^v  ■  Ce  +  cVp/StWl  +  (h,e)w,  (24) 

where  hzz  :=  rnzz(v2  —  V[x?e^°/z])  and  (-,  -)w  is  the  usual 
scalar  product  in  W.  This  expression  shows  that,  since  cz  >  0, 
there  is  a  unique  minimum  value  for  V  •  He.  We  prove  now  that 
the  sequence  defined  in  (23)  converges  to  that  minimum. 

Proposition  2  If  cz  =  ^Z^eD  mzz  ^  1  for  z  £  Oe  and  the  time 
step  v  verifies  v  <  A$tK ,  with 


min  sup 

ueu  ||eZJg||<l 


E  YmzzSzz'^7uz~v^+Tz  Y  cz(uz~fz)2- 

z£Oe  z£D  z £ Oe 


K  :=  max 
i,j£Oe 


i,j  x°,i  -x0 

A-i+1,3  ’  A-i ,J 


0+1  ,j 


Lj  O 

X+j+1 


ci,j-\- 1 


(25) 


We  have  defined  the  unit  field  e  G  W  :=  {e  :  Oe  x  D  — »•  M2}. 

Interchanging  the  minimum  with  the  supremum,  we  obtain 
the  following  expression  for  u  in  the  inpainting  domain 

u%  =  fz+Stc-1^  ■  £ez,  zeO.  (20) 

where  we  have  defined  the  linear  operator  C  :  W  — ►  V  as  Cez  = 

X^£ D  rnzz£zz- 

We  can  extend  expression  (20)  to  Oe,  by  defining  a  new  di¬ 
vergence  operator  V-  :  V  —*  U  as  V-£z  :=  x ?  V  •  Here  x?  =  1 
if  z  G  O  and  x?  —  0  otherwise.  We  also  define  a  modified  gra¬ 
dient  V-  :  U  —*■  V  as  the  negative  adjoint  operator  of  V-  ,  given 
by  V uz  :=  ^[x?uz]-  Besides  allowing  to  extend  (20)  these  op¬ 
erators  are  relevant  since,  as  opposed  to  the  usual  gradient  and 
divergence  operators,  they  are  still  dual  when  they  are  restricted 
to  functions  over  Oe:  (V  •  p,u)u  =  —  (p,Vu)v- 

Substituting  expression  (20)  in  the  energy,  we  arrive  to  the 
following  dual  formulation: 

min  —  St  ( Cez  •  V  \c~1  V  •  Ce\  H — c-1  [V  •  Csz}2^] 

-  ^2  ^z-^fz~^2mzzez2-V2  . 
z£Oe  |_  z£D 

Using  that  V[c_1V  •  Ce\z  =  V[c_1V  •  Ce]z  for  all  z  G  42,  we  can 
rewrite  the  dual  problem  with  the  more  compact  expression 


min 

IFzUKi 


E 

z  £Oe 


z  1  [V  •  Cez]2  -  Cez  •  Vfz  +  yZmZz£Zz 

z£D 


Vz  • 

(21) 


the  sequence  (e*)  defined  in  (23)  converges  to  a  minimum  of  the 
energy  (24). 

Proof  For  reasons  of  space,  we  only  sketch  the  main  steps  of 
the  proof.  The  derivations  are  similar  to  those  in  [2,18].  Let  us 
first  show  that  if  v  <  ,  the  energy  J*  decreases.  Defining 

p  :=  z/-1(et+1  —  e*)  for  a  given  t  ^  0,  we  have  that  the  variation 
in  the  energy  can  be  bounded  by 

jV+1)  svV)  +  U(^llc-1/2v  ■  cn\\b  ~  INI w) 

^r^  +  ^M^vStK-l),  (26) 

where  IF  is  a  constant  such  that  4iG||?7||^  ^  ||c_1/2V  •  Pp\\fj  for 
all  p  G  W.  We  will  derive  this  constant  later. 

Therefore,  by  choosing  v  <  (4 StK)-1  the  energy  decreases 
with  e1 .  Let  m  =  limt^oo  J*(e*)  ^  0.  Let  us  prove  now  that 
also  converges.  Since  ||e*^||  ^  1  for  all  z  G  Oe,  z  G  D,  t  ^ 
0,  there  exists  a  subsequence  ( etk )  converging  to  e.  From  (23) 
we  see  that  (£tfc+1)  converges  as  well.  Let  e'  =  limt^^oo  etk+1 . 
Defining  p  :=  (e'  —  e)/i/,  we  see  that  in  the  limit  when  £*.—»•  oo 
J*(ef)  ^  J*(e)  +  t>  \\rj\\^viy(4:i'StK  —  1).  Since  J*(e')  =  J*(e)  =  m 
we  can  conclude  that  p  =  0  and  therefore  e  =  e' .  This  implies 
that  the  whole  sequence  (et)  converges.  Furthermore  the  limit  e 
verifies  the  Euler  equation  (22),  and  thus  is  a  minimizer  of  the 
dual  energy  J*. 

We  now  sketch  the  derivation  of  the  expression  for  K.  Let  us 
define  the  following  c- weighted  norms:  ||it||2  :=  (c~1u:  u)u  for  u  G 

u ,  11$ lie  :=  (c_1$,  $) v  for  $  S  V  and  \\A\\2CC  :=  supeev  ( ) 
for  operators  A  :  V  U.  For  any  p  G  W  we  can  write 


Adding  the  Lagrange  multipliers  corresponding  to  the  restric¬ 
tions  \\ez2\\  <  1,  JZz,zeoexD  ^zz(\\£zz  ||2  -  1),  and  deriving  with 
respect  to  £  we  get  the  Euler  equation 

mz2[-\7fz  +  vB  -  (5tV[c_1  V  •  Ce\z\  +  =  0.  (22) 

The  Karush- Kuhn- Tucker  Theorem  yields  the  existence  of  the 
Lagrange  multipliers  with  values  Xz£  =  mzg\\  —  V fz  +  v%  — 
(5tV(c_1V  •  Ce)z\\.  The  solution  of  (21)  is  computed  with  the 
semi-implicit  scheme 

et+\  =  4  z  + y  ^  [-' YP  +  VA  ~  S_P_  [c~  EAf!kl  (23) 


l|c-1/2V  ■  Cr, II*  =  || V  ■  £i7||*  ^  || V  •  \\2cc\\CVfc. 
On  one  hand  we  have  that 


\\Cv\\l  <  \\v\\w  sup 

z£Oe 


^  WvWw 


where  first  inequality  is  an  application  of  Cauchy- Schwarz,  then 
we  have  used  that  JT  a?  ^  |uz|)2,  and  the  fact  that  cz  = 

Y2 Be d  rnzz  ^  1- 

On  the  other  hand,  from  the  definition  of  the  divergence  op¬ 
erator,  it  can  be  shown  that  ||V  •  ||2C  ^  4 K.  □ 
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Computational  considerations.  Let  us  remark  that  for  the  case 
of  inpainting  ( D  =  0%)  the  domain  of  the  dual  variable  £  is 
Oe  X  OJ  C  M4,  and  therefore  the  direct  implementation  of  this 
algorithm  is  prohibitive  (for  large  images  with  large  inpainting 
domains  £  will  not  fit  in  memory).  To  circumvent  this  problem 
we  threshold  the  pixel- wise  influence  weights  mz%,  so  as  to  keep 
for  each  z  only  the  M  highest  contributions.  This  reduces  the 
size  of  e  to  M\Oe\.  Indeed,  when  h  — ►  0  the  m  funcion  is  already 
sparse  w.r.t.  z,  being  the  number  of  non-zero  elements  less  or 
equal  than  the  patch  size  (in  pixels).  In  our  experiments  we  set 
M  to  the  size  of  the  patch,  capturing  exactly  the  function  m. 


C  Computation  of  the  Nearest  Neighbor  Field 

The  computation  of  the  nearest  neighbor  (or  of  the  weight  func¬ 
tion  w)  is  the  most  time  consuming  step  of  an  exemplar-based 
algorithm.  Two  key  observations  allow  to  reduce  the  computa¬ 
tional  load  for  this  task.  The  first  one  is  that,  the  nearest  neighbor 
search  can  be  approximated  without  compromising  the  quality  of 
the  output,  allowing  to  trade  precision  for  speed  (see  for  instance 
[13]  and  references  therein).  The  second  one,  as  noted  in  [7],  is 
that  due  to  the  coherence  of  natural  images,  nearest  neighbors 
of  patches  centered  on  nearby  pixels  are  likely  to  be  located  at 
nearby  positions. 

PatchMatch  is  a  very  efficient  algorithm  for  computing  ap¬ 
proximate  nearest  neighbors  proposed  in  [7].  The  authors  in  [7] 
consider  two  regions  O  and  Oc  of  the  same  or  different  images. 
The  objective  is  to  find  for  each  patch  centered  in  x  E  O  the 
location  of  the  best  match  n(x)  E  Oc.  The  authors  define  the 
Nearest  Neighbor  Field  (NNF)  as  the  function  x  i— >  (n{x)  —  x) 
defined  over  O.  Instead  of  searching  for  the  nearest  neighbor  of 
each  query  patch  independently,  PatchMatch  computes  the  NNF 
simultaneously  for  all  query  patches.  It  exploits  the  fact  that  since 
query  patches  overlapp,  the  offset  n(x)  —  x  of  a  good  match  for 
at  x  is  likely  to  lead  to  a  good  match  for  the  adjacent  points 
of  x  as  well.  It  is  an  iterative  algorithm  which  starting  from  a 
random  initialization,  alternates  between  steps  of  propagation  of 
good  offsets  and  random  search  (see  [7]  for  details).  For  most  ap¬ 
plications  with  natural  images,  a  few  iterations  after  a  random 
initialization  are  often  sufficient. 

We  extended  this  algorithm  following  a  suggestion  in  [7] .  In¬ 
stead  of  storing  a  single  offset  for  each  query  patch,  we  store 
queues  of  L  offsets  in  an  L-Nearest  Neighbors  Field  (L-NNF). 
Thus  the  number  of  initial  random  guesses  gets  multiplied  by  L. 
During  the  propagation  step  adjacent  queues  are  merged  in  to 
a  temporal  queue  of  length  2 L  from  which  only  the  best  L  off¬ 
sets  are  kept.  This  allows  to  preserve  and  transfer  non  optimal 
offsets  that  may  in  turn  be  optimal  for  farther  positions.  Our 
implementation  of  the  random  search  is  essentially  as  in  [7] . 

The  use  of  queues  represents  a  significant  improvement  of 
performance  with  respect  to  the  original  PatchMatch.  Figure  13 
shows  the  evolution  of  the  nearest  neighbor’s  error  measured  as 
^2X£0  II Pu(x)  —  pu(n(x)) ||2  with  time,  using  queues  of  different 
lengths.  Even  L  =  2  allows  to  achieve  results  of  similar  quality 
in  less  time.  Clearly  L  cannot  grow  unbounded:  with  L  =  30  the 
first  iteration  got  to  a  higher  precision  but  it  took  more  than  10 
seconds.  In  practice  we  set  L  <  10  and  perform  a  small  number 
of  iterations. 

Let  us  remark  that  each  queue  is  an  approximated  neighbor¬ 
hood  of  the  query  patch  in  the  patch  space.  This  neighborhood 
can  be  interpreted  as  a  truncated  representation  of  the  function 
w(Xj-)  for  each  pixel  x,  which  is  useful  in  the  non-degenerated 
case  when  h  >  0  in  (6). 

There  is  an  interesting  parallelism  between  this  algorithm 
and  the  mechanic  of  Genetic  Algorithms.  The  L-NNF  problem 


Fig.  13  Performance  of  the  modified  PatchMatch.  The 

graph  shows  the  evolution  of  the  nearest  neighbor’s  error  with  the 
iterations  of  the  algorithm.  Each  graph  corresponds  to  a  different 
size  L  of  the  sets  (L  =  1  corresponds  to  the  original  PatchMatch 
[7]).  The  dots  indicate  the  completion  of  an  iteration. 


can  be  seen  as  the  simultaneous  minimization  of  many  fitness 
functions,  one  for  each  pixel  in  O.  Each  fitness  function  is  defined 
as  the  sum  of  the  patch  errors  over  the  corresponding  queue.  If 
we  identify  the  individuals  with  the  queues  for  each  pixel  in  O, 
then  the  population  for  single  problem  is  composed  by  the  queue 
at  the  pixel  and  its  immediate  neighbors  (5  individuals).  The  ran¬ 
dom  search  can  be  interpreted  as  mutations ,  the  propagation  as 
an  optimal  crossover  between  individuals  (queues).  The  popula¬ 
tion  overlap  of  neighboring  problems  responds  to  the  fact  that 
adjacent  problems  are  likely  to  the  have  similar  solutions. 
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